Hook — LLM যুগের App
চ্যাটবট আজকাল কয়েক লাইনে বানানো যায়। কিন্তু একটা *ভালো* chatbot মানে state, memory, tool use, guardrail — সব মিলিয়ে একটা mini-system।
Scope ঠিক করো
- Domain — general, customer support, tutor, code helper?
- Persona — কেমন কথা বলবে, কোন language?
- Memory — শুধু session, না user-level persistent?
- Tools — search, calculator, database query?
- Safety — কোন topic refuse করবে?
Architecture
high-level
User ↔ FastAPI ↔ Orchestrator
│
┌──────────┼──────────┐
System Memory Tool/RAG
Prompt Store Layer
│
LLM APISystem Prompt Pattern
system.py
SYSTEM = """You are Koro, a friendly Bangla-English tutor.
- Always reply in the user's language.
- Be concise (under 4 sentences).
- If unsure, say so and ask a clarifying question.
- Refuse: medical, legal, financial personalized advice.
"""Memory — Short + Long
- Short-term: last N turn buffer।
- Summary memory: long conversation auto-summarize।
- Long-term: user facts → vector DB (pgvector)।
- Retrieval: প্রতি turn এ relevant fact pull করে context এ insert।
Code — Streaming Chat Endpoint
chat.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
app = FastAPI()
SESSIONS: dict[str, list[dict]] = {}
class Msg(BaseModel):
session_id: str
text: str
@app.post("/chat")
def chat(m: Msg):
history = SESSIONS.setdefault(m.session_id, [{"role":"system","content":SYSTEM}])
history.append({"role":"user","content":m.text})
def stream():
full = ""
for chunk in client.chat.completions.create(
model="gpt-4o-mini", messages=history, stream=True
):
delta = chunk.choices[0].delta.content or ""
full += delta
yield delta
history.append({"role":"assistant","content":full})
return StreamingResponse(stream(), media_type="text/plain")Tool Use — Function Calling
tools.py
tools = [{
"type":"function",
"function":{
"name":"get_weather",
"description":"Get weather for a city",
"parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}
}
}]
resp = client.chat.completions.create(
model="gpt-4o-mini", messages=history, tools=tools, tool_choice="auto"
)
if resp.choices[0].message.tool_calls:
call = resp.choices[0].message.tool_calls[0]
result = run_tool(call.function.name, call.function.arguments)
history.append({"role":"tool","tool_call_id":call.id,"content":result})Guardrail & Safety
- Input filter — PII / prompt injection check।
- Output filter — toxicity, regex secrets।
- Refusal templates — soft, helpful redirect।
- Rate limit + cost cap per user।
- Log every conversation (with consent) for QA।
Evaluation
- Offline: golden Q/A set → exact / LLM-as-judge।
- Online: thumbs-up rate, conversation length, retention।
- Red-teaming: jailbreak prompt regularly test।
Summary
এক নজরে
Chatbot = Prompt + Memory + Tool + Guardrail। Streaming UX আবশ্যক। Evaluation skip করলে hallucination silent ভাবে বাড়ে।