📚 সমস্ত অধ্যায় দেখুন
অধ্যায়/ফেজ 7 · Phase 7 · MLOps
7.5৩০ মিনিট পড়া47 / 68

Cloud ML Systems

Cloud ML

AWS, GCP, Azure এ ML deploy।

Hook — Cloud কেন?

Laptop এ GPU নেই, dataset 1TB, traffic spike হলে scale দরকার — Cloud এর কাছে এর সব সমাধান। Pay-as-you-go, managed service, global infrastructure।

Big Three + Friends

  • AWS — SageMaker, Bedrock, EC2, Lambda।
  • GCP — Vertex AI, GKE, Cloud Run।
  • Azure — Azure ML, OpenAI Service।
  • Specialized — Modal, Replicate, RunPod, Lambda Labs, Together।
  • Edge — Cloudflare Workers AI, Vercel।

ML Service Categories

  • Compute — EC2/GCE GPU (A100/H100)।
  • Managed Training — SageMaker, Vertex AI Training।
  • Managed Serving — SageMaker Endpoint, Vertex Endpoint।
  • Serverless GPU — Modal, Replicate, RunPod Serverless।
  • Storage — S3, GCS, Azure Blob।
  • Feature Store — SageMaker FS, Feast, Tecton।
  • Vector DB — Pinecone, Weaviate Cloud।
  • LLM API — Bedrock, Vertex Gemini, Azure OpenAI।

Code — SageMaker Endpoint

sagemaker_deploy.py
import sagemaker
from sagemaker.sklearn import SKLearnModel

model = SKLearnModel(
    model_data="s3://my-bucket/model.tar.gz",
    role=sagemaker.get_execution_role(),
    entry_point="inference.py",
    framework_version="1.2-1",
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="iris-prod",
)
print(predictor.predict([[5.1, 3.5, 1.4, 0.2]]))

Code — Modal Serverless GPU

modal_llm.py
import modal

image = modal.Image.debian_slim().pip_install("transformers", "torch")
app = modal.App("llama-api", image=image)

@app.cls(gpu="A10G", scaledown_window=300)
class Model:
    @modal.enter()
    def load(self):
        from transformers import pipeline
        self.gen = pipeline("text-generation", model="meta-llama/Llama-3.2-1B", device=0)

    @modal.method()
    def generate(self, prompt: str):
        return self.gen(prompt, max_new_tokens=120)[0]["generated_text"]

@app.local_entrypoint()
def main():
    print(Model().generate.remote("Bangla তে hello বলো"))

Kubernetes for ML

  • Kubeflow — ML pipeline on K8s।
  • KServe — model serving CRD।
  • Ray on K8s — distributed training।
  • Horizontal Pod Autoscaler — load-based scale।
  • GPU node pool + node selector।

Cost Optimization

  • Spot/Preemptible instance — ৭০-৯০% সস্তা।
  • Auto-scaling — idle এ scale-to-zero।
  • Model quantization (int8, 4-bit) — ছোট instance এ।
  • Batch inference — সস্তা।
  • Caching (Redis) — duplicate request এড়াও।
  • Reserved instance — long-term workload এ।

Security & Compliance

  • IAM — least privilege।
  • VPC, private endpoint।
  • Secret Manager / KMS।
  • Encryption at rest + in transit।
  • Audit log।
  • GDPR / HIPAA / SOC2 অনুযায়ী data handling।

Summary

এক নজরে

Cloud ML = Managed training + serving + storage + LLM API। SageMaker/Vertex enterprise এ, Modal/Replicate startup এ। Cost ও security সবসময় হিসেব করো।