Computer Vision — দেখার ক্ষমতা — মেশিন লার্নিং

Hook — দেখার ক্ষমতা

Self-driving car, medical imaging, face unlock, AR filter — সব Computer Vision। Pixel থেকে object, scene বোঝা — এটাই CV এর কাজ।

Core Tasks

Image Classification — পুরো image এ এক label।
Object Detection — bounding box + class (YOLO, Faster R-CNN)।
Semantic Segmentation — pixel-wise class (U-Net)।
Instance Segmentation — প্রতিটি object আলাদা (Mask R-CNN)।
Keypoint / Pose Estimation।
OCR — text recognition।

Famous Models

ResNet, EfficientNet — classification backbone।
YOLO v5/v8 — fast detection।
U-Net — medical segmentation।
Mask R-CNN — instance segmentation।
ViT, Swin Transformer — Transformer-based CV।
SAM (Segment Anything) — foundation model।

Code — YOLO Detection

yolo_detect.py

from ultralytics import YOLO

model = YOLO("yolov8n.pt")          # pre-trained
results = model("street.jpg")
results[0].show()                   # boxes সহ display
for box in results[0].boxes:
    print(box.cls, box.conf, box.xyxy)

Image Classification (Transfer Learning)

transfer_cls.py

import torch.nn as nn
from torchvision import models, transforms

model = models.resnet50(weights="IMAGENET1K_V2")
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)

tfm = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225]),
])

Data Augmentation (Albumentations)

augment.py

import albumentations as A
tfm = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.3),
    A.ShiftScaleRotate(p=0.5),
    A.Cutout(num_holes=4, max_h_size=16, max_w_size=16, p=0.3),
])

Metrics

Classification — Accuracy, F1।
Detection — mAP (mean Average Precision), IoU।
Segmentation — Dice, IoU।

Common Mistakes

ImageNet mean/std ছাড়া pre-trained model।
Aspect ratio নষ্ট করে resize।
Class imbalance ignore (rare class detection এ মারাত্মক)।

Summary

এক নজরে

CV = Task নির্বাচন → pre-trained backbone → augmentation → সঠিক metric। আজ ViT ও foundation model (SAM, CLIP) যুগ।