LLM Fundamentals — মেশিন লার্নিং

Hook — LLM কী, কীভাবে কাজ করে?

GPT-4, Claude, Gemini, LLaMA — সবই LLM। মূল কাজ এক: ‘পরবর্তী token কী?’ — অনুমান করা। কিন্তু কোটি কোটি parameter আর trillion token data দিয়ে সেই simple task থেকে অসাধারণ ক্ষমতা জন্ম নেয় (emergence)।

Architecture

Decoder-only Transformer (GPT family)।
Causal Self-Attention — শুধু past token দেখে।
Stack of N blocks: Attention + FFN + LayerNorm + Residual।
Modern: RoPE position, RMSNorm, SwiGLU activation, Grouped Query Attention।

Tokenization

Text → token id। BPE (GPT), SentencePiece (LLaMA), Tiktoken — subword tokenizer। একটা English word সাধারণত ১-২ token, Bangla অক্ষর বেশি token নেয়।

tokens.py

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("gpt2")
print(tok.tokenize("Lovable দিয়ে app বানানো সহজ!"))
print(tok.encode("Hello world"))

Training Pipeline

Pretraining — trillion token, next-token prediction, base model।
SFT — Supervised Fine-Tuning instruction data দিয়ে।
RLHF / DPO — human preference দিয়ে align (helpful, harmless)।
RAG / Tool use — runtime এ external knowledge যোগ।

Scaling Laws

Chinchilla paper দেখিয়েছে — parameter ও data দুটোই সমান গুরুত্বপূর্ণ। সাধারণ rule: প্রতি parameter এ ~২০ token data optimal।

Sampling / Decoding

Greedy — সবসময় highest probability token।
Temperature — randomness control (০ = deterministic, ১ = neutral)।
Top-k / Top-p (nucleus) — restricted vocab।
Beam search — multi-path, summarization/translation এ।

Famous LLMs

Closed — GPT-4o, Claude 3.5/4, Gemini 2।
Open — LLaMA 3, Mistral, Mixtral, Qwen, Gemma, DeepSeek।
Small — Phi-3, Gemma 2B — on-device।
Reasoning — o1, o3, DeepSeek-R1 — chain-of-thought training।

Code — Local LLM Inference

llama_chat.py

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

name = "meta-llama/Llama-3.2-3B-Instruct"
tok = AutoTokenizer.from_pretrained(name)
m = AutoModelForCausalLM.from_pretrained(name, torch_dtype=torch.float16, device_map="auto")

msgs = [{"role":"user","content":"Bangla তে gradient descent ব্যাখ্যা করো"}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=300, temperature=0.7, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

Limitations

Hallucination — confidently ভুল।
Knowledge cutoff।
Context window সীমা।
Math/reasoning weak (reasoning model এ ভালো)।
Bias inherited from data।

Summary

এক নজরে

LLM = Decoder Transformer + Tokenizer + Pretrain → SFT → RLHF। Scaling + alignment + tool use = modern AI agent এর ভিত্তি।