Hook — দেখার ক্ষমতা
Self-driving car, medical imaging, face unlock, AR filter — সব Computer Vision। Pixel থেকে object, scene বোঝা — এটাই CV এর কাজ।
Core Tasks
- Image Classification — পুরো image এ এক label।
- Object Detection — bounding box + class (YOLO, Faster R-CNN)।
- Semantic Segmentation — pixel-wise class (U-Net)।
- Instance Segmentation — প্রতিটি object আলাদা (Mask R-CNN)।
- Keypoint / Pose Estimation।
- OCR — text recognition।
Famous Models
- ResNet, EfficientNet — classification backbone।
- YOLO v5/v8 — fast detection।
- U-Net — medical segmentation।
- Mask R-CNN — instance segmentation।
- ViT, Swin Transformer — Transformer-based CV।
- SAM (Segment Anything) — foundation model।
Code — YOLO Detection
yolo_detect.py
from ultralytics import YOLO
model = YOLO("yolov8n.pt") # pre-trained
results = model("street.jpg")
results[0].show() # boxes সহ display
for box in results[0].boxes:
print(box.cls, box.conf, box.xyxy)Image Classification (Transfer Learning)
transfer_cls.py
import torch.nn as nn
from torchvision import models, transforms
model = models.resnet50(weights="IMAGENET1K_V2")
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)
tfm = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225]),
])Data Augmentation (Albumentations)
augment.py
import albumentations as A
tfm = A.Compose([
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.3),
A.ShiftScaleRotate(p=0.5),
A.Cutout(num_holes=4, max_h_size=16, max_w_size=16, p=0.3),
])Metrics
- Classification — Accuracy, F1।
- Detection — mAP (mean Average Precision), IoU।
- Segmentation — Dice, IoU।
Common Mistakes
- ImageNet mean/std ছাড়া pre-trained model।
- Aspect ratio নষ্ট করে resize।
- Class imbalance ignore (rare class detection এ মারাত্মক)।
Summary
এক নজরে
CV = Task নির্বাচন → pre-trained backbone → augmentation → সঠিক metric। আজ ViT ও foundation model (SAM, CLIP) যুগ।