Experiment Tracking — W&B, MLflow — মেশিন লার্নিং

Hook — ‘কোনটা best ছিল?’

দু’সপ্তাহ পর কেউ জিজ্ঞেস করবে — `lr=3e-5` আর `lr=5e-5` এর মধ্যে কোনটা ভালো ছিল? Notebook থাকলে ভুলে যাবে। Experiment tracking মানেই reproducibility এর backbone।

কী track করতে হবে

Code version (git commit SHA)।
Config / hyperparameter (yaml/json)।
Dataset version + hash।
Environment (Python, CUDA, package version)।
Metric (train/val loss, accuracy per epoch)।
Artifact (model weight, plot, sample prediction)।
System (GPU util, memory, time)।

Tool Landscape

compare

W&B (Weights & Biases)
  + best UI, sweeps, report sharing
  + free for personal
MLflow
  + open-source, self-host easy
  + tracking + model registry + serving
TensorBoard
  + ships with PyTorch/TF
  - basic, local only
Neptune / CometML / ClearML
  + enterprise features
DVC
  + data + pipeline version (git-like)

W&B — Quick Start

train_wandb.py

import wandb, torch

wandb.init(
    project="house-price",
    name="xgb-lr0.05-d4",
    config={"lr": 0.05, "depth": 4, "n_est": 800},
)

for epoch in range(epochs):
    train_loss = train_step()
    val_loss   = eval_step()
    wandb.log({"train/loss": train_loss, "val/loss": val_loss, "epoch": epoch})

# save model as artifact
art = wandb.Artifact("model", type="model")
art.add_file("model.pkl")
wandb.log_artifact(art)
wandb.finish()

MLflow — Self-host

train_mlflow.py

import mlflow
mlflow.set_tracking_uri("http://mlflow.internal:5000")
mlflow.set_experiment("spam-detection")

with mlflow.start_run(run_name="distilbert-3ep"):
    mlflow.log_params({"model":"distilbert","lr":2e-5,"epochs":3})
    for ep in range(3):
        mlflow.log_metric("val_f1", f1, step=ep)
    mlflow.sklearn.log_model(pipe, artifact_path="model")
    mlflow.log_artifact("confusion_matrix.png")

Hyperparameter Sweep

sweep.yaml (W&B)

method: bayes
metric: {name: val/loss, goal: minimize}
parameters:
  lr:        {min: 1e-5, max: 1e-3, distribution: log_uniform_values}
  depth:     {values: [3, 4, 5, 6]}
  n_est:     {values: [400, 800, 1200]}
  dropout:   {min: 0.0, max: 0.5}

Run: `wandb sweep sweep.yaml` → `wandb agent <id>`। Grid/random/bayes — সব সাপোর্ট।

Model Registry & Stages

Registry = production-ready model এর central store।
Stages: None → Staging → Production → Archived।
Promote rules: minimum metric + signed-off।
Inference server registry থেকে ‘Production’ tag load করে।

Best Practices

Run name নয় → tag + group + notes দিয়ে search-able করো।
Config একটা yaml — code এ hardcode না।
Seed log করো — reproducibility।
Failed run ও log রাখো — debug এ কাজে দেয়।
Artifact versioning — ‘best_model.pkl’ overwrite করো না।

Summary

এক নজরে

Tracking = code+config+data+metric+artifact। W&B fast, MLflow open-source. Sweep + Registry = production readiness। যা track করো না, সেটা reproduce করতে পারবে না।