Model Monitoring — মেশিন লার্নিং

Hook — Deploy = শেষ নয়, শুরু

World বদলায়, user বদলায়, data বদলায় — কিন্তু model fixed। Monitoring ছাড়া তুমি জানবেই না কখন model পচে গেছে। Production ML এর সবচেয়ে underrated অংশ।

কী Monitor করব?

Operational — latency (p50/p95/p99), throughput, error rate, uptime।
Data quality — missing, range violation, schema break।
Data Drift — input distribution change।
Concept Drift — input-output সম্পর্ক change।
Model performance — accuracy, AUC (label এলে)।
Business KPI — conversion, revenue।
Fairness — subgroup performance।

Data vs Concept Drift

পার্থক্য

Data drift = P(X) বদলেছে (user demographic shift)। Concept drift = P(y|X) বদলেছে (covid এর পর spending pattern বদলেছে)।

Test — KS test, PSI (Population Stability Index), Wasserstein।
PSI < 0.1 = stable, 0.1-0.25 = moderate, >0.25 = alert।

Code — Evidently দিয়ে Drift Report

drift.py

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, RegressionPreset

report = Report(metrics=[DataDriftPreset(), RegressionPreset()])
report.run(reference_data=ref_df, current_data=prod_df)
report.save_html("drift_report.html")
result = report.as_dict()
if result["metrics"][0]["result"]["dataset_drift"]:
    alert_slack("⚠️ Data drift detected")

Prometheus Metrics from FastAPI

metrics.py

from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app, endpoint="/metrics")

from prometheus_client import Counter, Histogram
PRED = Counter("predictions_total", "Total predictions", ["class"])
LAT  = Histogram("predict_latency_seconds", "Latency")

@app.post("/predict")
def predict(f: Features):
    with LAT.time():
        out = model.predict(...)
        PRED.labels(class_=out).inc()
        return out

Monitoring Stack

Metrics — Prometheus + Grafana।
Logs — Loki, ELK, Datadog।
Tracing — OpenTelemetry, Jaeger।
ML-specific — Evidently, WhyLabs, Arize, Fiddler।
Alerting — PagerDuty, Slack webhook।

Feedback Loop

Prediction log → ground truth join → live metric।
User feedback (thumbs up/down) capture।
Drift detection → auto-retrain trigger।
Shadow deployment দিয়ে candidate compare।

Common Mistakes

শুধু accuracy track করা — latency, business KPI দরকার।
Reference dataset update না করা।
Alert fatigue — খুব sensitive threshold।
Subgroup performance check না করা — bias miss।

Summary

এক নজরে

Monitoring = Operational + Data quality + Drift + Performance + Business। Prometheus/Grafana + Evidently = base stack।