Regression — সংখ্যা predict করা — মেশিন লার্নিং

Hook — দাম কত হবে?

তুমি বাড়ি কিনতে চাও। ১২০০ sqft, ৩ বেডরুম — দাম কত হওয়া উচিত? Real estate agent কয়েকটি similar বাড়ির দাম দেখে একটা estimate দেয়। Regression algorithm ঠিক সেই কাজটাই করে — past data থেকে continuous সংখ্যা predict করা।

Concept — Continuous Output

Regression এমন supervised task যেখানে output একটি real number। দাম, তাপমাত্রা, salary, বৃষ্টিপাত — সবই regression।

ধরন

Simple Linear Regression — একটি feature।
Multiple Linear Regression — অনেক feature।
Polynomial Regression — non-linear curve।
Ridge / Lasso — regularized linear regression।

Math — Loss ও Metric

Loss: Mean Squared Error

MSE = (1/n) Σ (yᵢ − ŷᵢ)²

Evaluation: R² Score

R² = 1 − SS_res / SS_tot

R² 1 এর কাছে = model অনেক variance ব্যাখ্যা করছে। 0 মানে গড় predict করার মতই খারাপ।

অন্যান্য Metric

MAE — Mean Absolute Error, outlier এ কম sensitive।
RMSE — MSE এর square root, original unit এ।
MAPE — percentage error।

Real-world Examples

House price prediction।
Electricity consumption forecasting।
Sales prediction।
Patient hospital stay duration।

Code — Multiple Linear Regression

multi_regression.py

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Synthetic housing data
np.random.seed(0)
n = 200
area = np.random.randint(500, 3000, n)
rooms = np.random.randint(1, 5, n)
age = np.random.randint(0, 30, n)
price = area * 120 + rooms * 50000 - age * 2000 + np.random.normal(0, 20000, n)

df = pd.DataFrame({"area": area, "rooms": rooms, "age": age, "price": price})

X = df[["area", "rooms", "age"]]
y = df["price"]
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression().fit(Xtr, ytr)
pred = model.predict(Xte)

print("Coefficients:", dict(zip(X.columns, model.coef_)))
print("MSE:", mean_squared_error(yte, pred))
print("R^2:", r2_score(yte, pred))

Common Mistakes

Non-linear data এ linear regression — underfit।
Multicollinearity — দুটি feature প্রায় একই information বহন করে।
Outlier কে ignore — line পুরো ঘুরিয়ে দিতে পারে।
Extrapolation — training range এর বাইরে predict ভরসা করা।

Practice Tasks

Task 1: একটি feature বাদ দিয়ে R² কতটা কমে দেখো।
Task 2: Polynomial feature (degree=2) যোগ করে result তুলনা করো।
Task 3: actual vs predicted scatter plot আঁকো।

Mini Project — Bike Rental Predictor

UCI “Bike Sharing” dataset নাও। Temperature, humidity, hour, season feature দিয়ে rental count predict করো। R² > 0.7 target করো।

Summary

এক নজরে

Regression = continuous সংখ্যা predict, evaluation MSE / R²।