Hook — Cloud কেন?
Laptop এ GPU নেই, dataset 1TB, traffic spike হলে scale দরকার — Cloud এর কাছে এর সব সমাধান। Pay-as-you-go, managed service, global infrastructure।
Big Three + Friends
- AWS — SageMaker, Bedrock, EC2, Lambda।
- GCP — Vertex AI, GKE, Cloud Run।
- Azure — Azure ML, OpenAI Service।
- Specialized — Modal, Replicate, RunPod, Lambda Labs, Together।
- Edge — Cloudflare Workers AI, Vercel।
ML Service Categories
- Compute — EC2/GCE GPU (A100/H100)।
- Managed Training — SageMaker, Vertex AI Training।
- Managed Serving — SageMaker Endpoint, Vertex Endpoint।
- Serverless GPU — Modal, Replicate, RunPod Serverless।
- Storage — S3, GCS, Azure Blob।
- Feature Store — SageMaker FS, Feast, Tecton।
- Vector DB — Pinecone, Weaviate Cloud।
- LLM API — Bedrock, Vertex Gemini, Azure OpenAI।
Code — SageMaker Endpoint
sagemaker_deploy.py
import sagemaker
from sagemaker.sklearn import SKLearnModel
model = SKLearnModel(
model_data="s3://my-bucket/model.tar.gz",
role=sagemaker.get_execution_role(),
entry_point="inference.py",
framework_version="1.2-1",
)
predictor = model.deploy(
initial_instance_count=1,
instance_type="ml.m5.large",
endpoint_name="iris-prod",
)
print(predictor.predict([[5.1, 3.5, 1.4, 0.2]]))Code — Modal Serverless GPU
modal_llm.py
import modal
image = modal.Image.debian_slim().pip_install("transformers", "torch")
app = modal.App("llama-api", image=image)
@app.cls(gpu="A10G", scaledown_window=300)
class Model:
@modal.enter()
def load(self):
from transformers import pipeline
self.gen = pipeline("text-generation", model="meta-llama/Llama-3.2-1B", device=0)
@modal.method()
def generate(self, prompt: str):
return self.gen(prompt, max_new_tokens=120)[0]["generated_text"]
@app.local_entrypoint()
def main():
print(Model().generate.remote("Bangla তে hello বলো"))Kubernetes for ML
- Kubeflow — ML pipeline on K8s।
- KServe — model serving CRD।
- Ray on K8s — distributed training।
- Horizontal Pod Autoscaler — load-based scale।
- GPU node pool + node selector।
Cost Optimization
- Spot/Preemptible instance — ৭০-৯০% সস্তা।
- Auto-scaling — idle এ scale-to-zero।
- Model quantization (int8, 4-bit) — ছোট instance এ।
- Batch inference — সস্তা।
- Caching (Redis) — duplicate request এড়াও।
- Reserved instance — long-term workload এ।
Security & Compliance
- IAM — least privilege।
- VPC, private endpoint।
- Secret Manager / KMS।
- Encryption at rest + in transit।
- Audit log।
- GDPR / HIPAA / SOC2 অনুযায়ী data handling।
Summary
এক নজরে
Cloud ML = Managed training + serving + storage + LLM API। SageMaker/Vertex enterprise এ, Modal/Replicate startup এ। Cost ও security সবসময় হিসেব করো।