CNN — Image চেনার network — মেশিন লার্নিং

Hook — Image চেনার জাদু

একটা 224×224 RGB image মানে ১,৫০,৫২৮ pixel। Fully Connected Network এ প্রথম layer এ-ই কোটি কোটি parameter। CNN এই সমস্যার সুন্দর সমাধান — local pattern শেখে, weight share করে, position invariant।

Convolution Operation

একটা ছোট kernel/filter (যেমন 3×3) image এর উপর slide করে dot product নেয়। ফলাফল — feature map।

(I * K)(i, j) = ΣΣ I(i+m, j+n) · K(m, n)

প্রথম layer — edge, color detect করে।
Middle layer — texture, shape।
Deep layer — object part, face।

CNN Building Blocks

Conv Layer — feature extraction।
Activation (ReLU) — non-linearity।
Pooling (Max/Avg) — spatial size কমায়, translation invariant।
Stride — kernel কতটা লাফায়।
Padding — border ঠিক রাখে (same vs valid)।
Fully Connected — শেষে classification।

Famous Architectures

LeNet (1998) — digit recognition।
AlexNet (2012) — ImageNet বিপ্লব।
VGG — সহজ গভীর architecture (16/19 layer)।
ResNet — Residual / Skip Connection, 152 layer।
Inception — multi-scale filter parallel।
EfficientNet — compound scaling।

Code — PyTorch CNN

cnn_mnist.py

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool  = nn.MaxPool2d(2, 2)
        self.fc1   = nn.Linear(64 * 7 * 7, 128)
        self.fc2   = nn.Linear(128, 10)
        self.relu  = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))  # 28→14
        x = self.pool(self.relu(self.conv2(x)))  # 14→7
        x = x.view(x.size(0), -1)
        x = self.relu(self.fc1(x))
        return self.fc2(x)

Transfer Learning

ImageNet এ pre-trained ResNet/EfficientNet নিয়ে নিজের ছোট dataset এ fine-tune করো — প্রায় সবসময় scratch থেকে training এর চেয়ে ভালো।

transfer.py

from torchvision import models
import torch.nn as nn

model = models.resnet50(weights="IMAGENET1K_V2")
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)  # নতুন head

Data Augmentation

Random Crop
Horizontal Flip
Color Jitter
Rotation
Cutout / Mixup

Common Mistakes

Normalization না করা (mean/std)।
Train mode এ inference।
Small dataset এ scratch training।

Summary

এক নজরে

CNN = Convolution (local pattern) + Pooling (compress) + Hierarchy (simple→complex)।