📚 সমস্ত অধ্যায় দেখুন
অধ্যায়/ফেজ 5 · Phase 5 · Specialized AI
5.2৩০ মিনিট পড়া33 / 68

Computer Vision — দেখার ক্ষমতা

Computer Vision

Image classification, detection, segmentation।

Hook — দেখার ক্ষমতা

Self-driving car, medical imaging, face unlock, AR filter — সব Computer Vision। Pixel থেকে object, scene বোঝা — এটাই CV এর কাজ।

Core Tasks

  • Image Classification — পুরো image এ এক label।
  • Object Detection — bounding box + class (YOLO, Faster R-CNN)।
  • Semantic Segmentation — pixel-wise class (U-Net)।
  • Instance Segmentation — প্রতিটি object আলাদা (Mask R-CNN)।
  • Keypoint / Pose Estimation।
  • OCR — text recognition।

Famous Models

  • ResNet, EfficientNet — classification backbone।
  • YOLO v5/v8 — fast detection।
  • U-Net — medical segmentation।
  • Mask R-CNN — instance segmentation।
  • ViT, Swin Transformer — Transformer-based CV।
  • SAM (Segment Anything) — foundation model।

Code — YOLO Detection

yolo_detect.py
from ultralytics import YOLO

model = YOLO("yolov8n.pt")          # pre-trained
results = model("street.jpg")
results[0].show()                   # boxes সহ display
for box in results[0].boxes:
    print(box.cls, box.conf, box.xyxy)

Image Classification (Transfer Learning)

transfer_cls.py
import torch.nn as nn
from torchvision import models, transforms

model = models.resnet50(weights="IMAGENET1K_V2")
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)

tfm = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225]),
])

Data Augmentation (Albumentations)

augment.py
import albumentations as A
tfm = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.3),
    A.ShiftScaleRotate(p=0.5),
    A.Cutout(num_holes=4, max_h_size=16, max_w_size=16, p=0.3),
])

Metrics

  • Classification — Accuracy, F1।
  • Detection — mAP (mean Average Precision), IoU।
  • Segmentation — Dice, IoU।

Common Mistakes

  • ImageNet mean/std ছাড়া pre-trained model।
  • Aspect ratio নষ্ট করে resize।
  • Class imbalance ignore (rare class detection এ মারাত্মক)।

Summary

এক নজরে

CV = Task নির্বাচন → pre-trained backbone → augmentation → সঠিক metric। আজ ViT ও foundation model (SAM, CLIP) যুগ।