Hook — ‘কোনটা best ছিল?’
দু’সপ্তাহ পর কেউ জিজ্ঞেস করবে — `lr=3e-5` আর `lr=5e-5` এর মধ্যে কোনটা ভালো ছিল? Notebook থাকলে ভুলে যাবে। Experiment tracking মানেই reproducibility এর backbone।
কী track করতে হবে
- Code version (git commit SHA)।
- Config / hyperparameter (yaml/json)।
- Dataset version + hash।
- Environment (Python, CUDA, package version)।
- Metric (train/val loss, accuracy per epoch)।
- Artifact (model weight, plot, sample prediction)।
- System (GPU util, memory, time)।
Tool Landscape
compare
W&B (Weights & Biases)
+ best UI, sweeps, report sharing
+ free for personal
MLflow
+ open-source, self-host easy
+ tracking + model registry + serving
TensorBoard
+ ships with PyTorch/TF
- basic, local only
Neptune / CometML / ClearML
+ enterprise features
DVC
+ data + pipeline version (git-like)W&B — Quick Start
train_wandb.py
import wandb, torch
wandb.init(
project="house-price",
name="xgb-lr0.05-d4",
config={"lr": 0.05, "depth": 4, "n_est": 800},
)
for epoch in range(epochs):
train_loss = train_step()
val_loss = eval_step()
wandb.log({"train/loss": train_loss, "val/loss": val_loss, "epoch": epoch})
# save model as artifact
art = wandb.Artifact("model", type="model")
art.add_file("model.pkl")
wandb.log_artifact(art)
wandb.finish()MLflow — Self-host
train_mlflow.py
import mlflow
mlflow.set_tracking_uri("http://mlflow.internal:5000")
mlflow.set_experiment("spam-detection")
with mlflow.start_run(run_name="distilbert-3ep"):
mlflow.log_params({"model":"distilbert","lr":2e-5,"epochs":3})
for ep in range(3):
mlflow.log_metric("val_f1", f1, step=ep)
mlflow.sklearn.log_model(pipe, artifact_path="model")
mlflow.log_artifact("confusion_matrix.png")Hyperparameter Sweep
sweep.yaml (W&B)
method: bayes
metric: {name: val/loss, goal: minimize}
parameters:
lr: {min: 1e-5, max: 1e-3, distribution: log_uniform_values}
depth: {values: [3, 4, 5, 6]}
n_est: {values: [400, 800, 1200]}
dropout: {min: 0.0, max: 0.5}Run: `wandb sweep sweep.yaml` → `wandb agent <id>`। Grid/random/bayes — সব সাপোর্ট।
Model Registry & Stages
- Registry = production-ready model এর central store।
- Stages: None → Staging → Production → Archived।
- Promote rules: minimum metric + signed-off।
- Inference server registry থেকে ‘Production’ tag load করে।
Best Practices
- Run name নয় → tag + group + notes দিয়ে search-able করো।
- Config একটা yaml — code এ hardcode না।
- Seed log করো — reproducibility।
- Failed run ও log রাখো — debug এ কাজে দেয়।
- Artifact versioning — ‘best_model.pkl’ overwrite করো না।
Summary
এক নজরে
Tracking = code+config+data+metric+artifact। W&B fast, MLflow open-source. Sweep + Registry = production readiness। যা track করো না, সেটা reproduce করতে পারবে না।