Hook — Analytics এর Engine
Production database (OLTP) এ অসংখ্য ছোট transaction হয় — analytics query চালালে slow। Data Warehouse (OLAP) সেই দ্বন্দ্ব সমাধান করে — billion row aggregate seconds এ।
OLTP vs OLAP
- OLTP — row-based, short transaction (Postgres, MySQL)।
- OLAP — column-based, aggregate scan (Snowflake, BigQuery, Redshift, ClickHouse)।
- Columnar = একই column এর সব value পাশাপাশি = compression + scan ভালো।
Data Modeling
- Star Schema — center এ fact, পাশে dimension।
- Snowflake Schema — dimension আবার normalize।
- Data Vault — flexible enterprise model।
- One Big Table (OBT) — columnar এ দ্রুত, dbt friendly।
Major Warehouses
- Snowflake — multi-cloud, separation of storage/compute।
- BigQuery — serverless, pay-per-query।
- Redshift — AWS native।
- Databricks SQL — Lakehouse।
- ClickHouse — open source, blazing fast।
- Postgres + DuckDB — small/medium scale।
Code — Star Schema SQL
schema.sql
-- Dimension tables
CREATE TABLE dim_customer (
customer_id INT PRIMARY KEY,
name TEXT,
country TEXT,
signup_date DATE
);
CREATE TABLE dim_product (
product_id INT PRIMARY KEY,
name TEXT,
category TEXT,
price DECIMAL
);
CREATE TABLE dim_date (
date_id DATE PRIMARY KEY,
year INT, month INT, day INT,
weekday TEXT, is_weekend BOOLEAN
);
-- Fact table
CREATE TABLE fact_sales (
sale_id BIGINT PRIMARY KEY,
date_id DATE REFERENCES dim_date,
customer_id INT REFERENCES dim_customer,
product_id INT REFERENCES dim_product,
quantity INT,
revenue DECIMAL
);
-- Analytical query
SELECT d.month, p.category, sum(f.revenue) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
WHERE d.year = 2025
GROUP BY 1, 2
ORDER BY 1, 3 DESC;Performance Tricks
- Partitioning — date column দিয়ে scan কমাও।
- Clustering / Z-ordering — co-located related row।
- Materialized view — pre-aggregated result।
- Approximate functions — APPROX_COUNT_DISTINCT।
- Query result cache।
- Result reuse / incremental model (dbt)।
Cost Control
- SELECT * এড়াও — শুধু দরকারি column।
- Partition filter সবসময় add করো।
- Auto-suspend warehouse (Snowflake)।
- Query monitoring + budget alert।
- Workload — analyst vs ETL vs ML আলাদা warehouse।
ML Integration
- BigQuery ML, Snowflake Cortex — SQL এ-ই model।
- dbt + ML — feature view।
- Reverse ETL — warehouse → operational tool (Hightouch, Census)।
Summary
এক নজরে
Warehouse = OLAP, columnar, star schema। Snowflake/BigQuery + dbt + materialized view = scalable analytics + ML feature pipeline।