Diffusion Models — মেশিন লার্নিং

Hook — Noise থেকে ছবি

Stable Diffusion, DALL·E 3, Midjourney, FLUX, Sora — সবই Diffusion। মূল idea অদ্ভুত সুন্দর: আস্তে আস্তে ছবিতে noise যোগ করো, তারপর network কে শেখাও noise সরাতে। Inference এ pure noise থেকে শুরু করে denoise করে নতুন ছবি।

Forward & Reverse Process

Forward (q) — T step ধরে Gaussian noise যোগ — শেষে pure noise।
Reverse (pθ) — neural network শেখে প্রতিটি step এ noise কতটুকু সরাতে হবে।

q(x_t | x_{t-1}) = N(x_t; √(1-βₜ)·x_{t-1}, βₜ·I)

Training Objective

Network ε_θ(x_t, t) predict করে — কোন noise যোগ হয়েছিল।

L = E[ ‖ ε − ε_θ(x_t, t) ‖² ]

Architecture

U-Net — denoising backbone (skip connection সহ)।
Time embedding — কোন step তা বোঝাতে।
Cross-attention — text condition (CLIP embedding)।
Latent Diffusion (Stable Diffusion) — VAE latent space এ diffusion, অনেক দ্রুত।

Guidance & Sampling

Classifier-Free Guidance (CFG) — text prompt strength।
DDPM — original, ১০০০ step।
DDIM — deterministic, ৫০ step।
DPM-Solver — ২০ step এ ভালো quality।
LCM, Turbo — ১-৪ step এ near-real-time।

Code — Diffusers দিয়ে Generation

sd_xl.py

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

img = pipe(
    prompt="serene japanese garden, cherry blossoms, soft morning light, 8k",
    negative_prompt="blurry, low quality",
    num_inference_steps=30,
    guidance_scale=7.0,
).images[0]
img.save("garden.png")

Image-to-Image

img2img.py

from diffusers import StableDiffusionImg2ImgPipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16).to("cuda")
out = pipe(prompt="oil painting style", image=init_img, strength=0.6).images[0]

Control & Customization

ControlNet — edge/pose/depth দিয়ে structure control।
LoRA — light-weight fine-tune (নিজের style/face)।
DreamBooth — concept inject।
Inpainting — image এর অংশ replace।
IP-Adapter — image reference হিসেবে।

Summary

এক নজরে

Diffusion = Noise যোগ করো → denoise করতে শেখাও → noise থেকে নতুন ছবি। Latent + CFG + ControlNet/LoRA = modern generative image stack।