Neural Networks — মস্তিষ্ক থেকে কোড

Hook — মস্তিষ্ক থেকে কোড

মানুষের brain এ ৮৬ বিলিয়ন neuron — একটার সাথে আরেকটা connection। Neural Network সেই idea এর গাণিতিক অনুকরণ। প্রতিটি neuron input নেয়, weight দিয়ে গুণ করে, একটা output বের করে।

Perceptron — সবচেয়ে ছোট unit

১৯৫৮ সালে Rosenblatt এর Perceptron। একটা neuron, কয়েকটা input, একটা output।

y = f(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)

f হলো activation function। w হলো weight, b হলো bias — এগুলোই learn করে network।

Architecture — Layer এর গঠন

Input Layer — feature গ্রহণ করে।
Hidden Layer(s) — pattern শেখে, যত deep তত complex feature।
Output Layer — final prediction (regression: 1 neuron, classification: class অনুযায়ী)।

Deep Learning

১+ hidden layer থাকলেই Neural Network। ৩+ hidden layer হলে সাধারণত Deep Network।

Activation Functions

Sigmoid — σ(x) = 1/(1+e^-x), output 0-1, vanishing gradient সমস্যা।
Tanh — output -1 থেকে 1, zero-centered।
ReLU — max(0, x), দ্রুত এবং সবচেয়ে ব্যবহৃত।
Leaky ReLU — negative side এ ছোট slope, dead neuron সমস্যা কমায়।
Softmax — multi-class output এ probability distribution।

কেন non-linear?

Linear activation দিয়ে যতই layer যোগ করো — পুরো network linear-ই থাকে। Non-linearity ই deep learning এর শক্তি।

Forward Propagation

Input → hidden layer → output — এই path এ matrix multiplication ও activation চলে।

a^(l) = f(W^(l) · a^(l-1) + b^(l))

Loss Function

Regression — MSE, MAE।
Binary Classification — Binary Cross-Entropy।
Multi-class — Categorical Cross-Entropy।

Code — PyTorch দিয়ে Simple NN

simple_nn.py

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        return self.fc3(x)  # logits

model = Net()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
print(model)

Common Mistakes

Feature scaling না করা — gradient explode/vanish।
Output layer এ ভুল activation (regression এ softmax)।
অনেক বড় network ছোট data তে — overfit নিশ্চিত।

Summary

এক নজরে

Neural Network = layer + weight + non-linear activation + loss + optimizer।