মেশিন লার্নিং

শূন্য থেকে এআই ইঞ্জিনিয়ার

শেখা শুরু করুন →

📚 সমস্ত অধ্যায় দেখুন

অধ্যায়/ফেজ 5 · Phase 5 · Specialized AI

5.1⏱ ৩০ মিনিট পড়া32 / 68

NLP — ভাষা বুঝে নেওয়া

Natural Language Processing

Tokenization, embedding, BERT।

Hook — ভাষা বুঝে নেওয়া

মানুষের ভাষা ambiguous, context-dependent। NLP সেই ভাষাকে কম্পিউটারের কাছে বোধগম্য করে — search, translation, chatbot, summarization সব এর প্রয়োগ।

Classical NLP Pipeline

Tokenization — text → token।
Lowercasing, Stopword removal।
Stemming / Lemmatization — base form।
POS tagging, NER।
Vectorization — Bag-of-Words, TF-IDF।

Word Embeddings

Word2Vec — CBOW, Skip-gram।
GloVe — global co-occurrence।
FastText — subword।
Contextual (ELMo, BERT) — একই শব্দ context অনুযায়ী আলাদা vector।

king − man + woman ≈ queen

Modern NLP — Transformer Era

BERT — bidirectional encoder, classification/QA।
GPT — generation।
T5 — text-to-text everything।
Sentence-BERT — semantic similarity।

Common Tasks

Text Classification (sentiment, spam)।
NER — entity extraction।
Question Answering।
Summarization।
Translation।
Topic Modeling (LDA)।

Code — HuggingFace দিয়ে কয়েকটি Task

nlp_demo.py

from transformers import pipeline

clf = pipeline("sentiment-analysis")
print(clf("Lovable দারুণ tool!"))

ner = pipeline("ner", grouped_entities=True)
print(ner("Sundar Pichai is the CEO of Google in California."))

summ = pipeline("summarization")
print(summ("Long article text...", max_length=60, min_length=20))

TF-IDF + Logistic Regression (classical)

tfidf.py

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

pipe = make_pipeline(TfidfVectorizer(ngram_range=(1,2), min_df=2),
                     LogisticRegression(max_iter=1000))
pipe.fit(X_train, y_train)
print(pipe.score(X_test, y_test))

Common Mistakes

Tokenizer ও model mismatch।
Bangla/multilingual text এ English-only model।
Stopword removal করে অর্থ নষ্ট করা (BERT এ দরকার নেই)।

Summary

এক নজরে

NLP = Token → Embedding → Model (classical বা Transformer) → Task। আজ Transformer ই default।

← আগের অধ্যায়

4.6Attention Mechanism

পরের অধ্যায় →

5.2Computer Vision — দেখার ক্ষমতা