Decision Tree — গাছের মত সিদ্ধান্ত

Hook — “হ্যাঁ/না” এর সিঁড়ি

ডাক্তার রোগ ধরার সময় প্রশ্ন করেন — “জ্বর আছে?” → হ্যাঁ → “কাশি?” → না → … পরপর হ্যাঁ/না দিয়ে একটি গাছের মত decision। এটাই Decision Tree।

Concept — Recursive Splitting

প্রতি node এ এমন feature ও threshold বেছে নেওয়া হয় যা data কে সবচেয়ে “pure” দুই ভাগে ভাগ করে। যতক্ষণ pure না হয় বা stop condition না আসে — recursively split চলে।

Math — Impurity Measure

Gini Impurity

Gini = 1 − Σ pᵢ²

Entropy ও Information Gain

H = − Σ pᵢ · log₂(pᵢ)

IG = H(parent) − Σ (|child|/|parent|) · H(child)

Split এর সময় সবচেয়ে বেশি Information Gain (বা Gini reduction) দেয় এমন feature/threshold বেছে নেওয়া হয়।

Real-world Use

Credit scoring — সহজ rule extract করা যায়।
Medical diagnosis — interpretable decision।
Customer churn — কোন feature এ চলে যাচ্ছে বোঝা।
Random Forest, XGBoost এর building block।

Code — Sklearn Decision Tree

tree_iris.py

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

X, y = load_iris(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(max_depth=3, criterion="gini", random_state=42)
model.fit(Xtr, ytr)
print("Train acc:", model.score(Xtr, ytr))
print("Test acc :", model.score(Xte, yte))

plt.figure(figsize=(10, 6))
plot_tree(model, filled=True, feature_names=load_iris().feature_names,
          class_names=load_iris().target_names)
plt.show()

Common Mistakes

সাবধান

Depth সীমা না দিলে tree পুরো train data মুখস্থ করে ফেলে — মারাত্মক overfit।

`max_depth`, `min_samples_leaf` set না করা।
Imbalanced data এ pruning না করা।
Categorical feature কে সরাসরি integer ধরে নেওয়া।
Single tree এর উপর ভরসা — ensemble (RF/XGB) ভালো।

Practice Tasks

Task 1: depth 1 থেকে 20 পর্যন্ত train/test accuracy plot করো — overfit point দেখো।
Task 2: Gini vs Entropy এর result তুলনা করো।
Task 3: `feature_importances_` দেখো — কোন feature গুরুত্বপূর্ণ।

Mini Project — Loan Approval Tree

Kaggle “Loan Prediction” dataset এ Decision Tree train করো। Tree visualize করো এবং top 3 rule extract করে Bengali তে ব্যাখ্যা করো — “কেন এই decision”।

Summary

এক নজরে

Decision Tree = Recursive split দিয়ে purity বাড়ানো = সবচেয়ে interpretable model।