Database Indexing

Index = data structure แยกจาก heap ที่ช่วยค้นหา row เร็วขึ้น — เกือบทุก DB ใช้ B+Tree เป็นมาตรฐาน

Demo: ผลของ Index (11 ล้านแถว)

-- ไม่มี index → Parallel Seq Scan → ~3 วินาที
EXPLAIN ANALYZE SELECT id FROM employees WHERE name = 'ZS';
 
-- สร้าง index
CREATE INDEX employees_name ON employees(name);
 
-- มี index → Bitmap Index Scan → ~47 ms
EXPLAIN ANALYZE SELECT id FROM employees WHERE name = 'ZS';
-- ความเร็วเพิ่ม ~60 เท่า!

B-Tree vs B+Tree

B-Tree

แต่ละ node เก็บ (key, value) — value อยู่ทุก node (รวม internal)
1 node = 1 disk page ในระบบ DB จริง
ข้อจำกัด: internal nodes ใหญ่ (เก็บ value ด้วย) → tree สูง → IO มาก
Range query แย่ — ต้อง traverse หลาย subtree

B+Tree (เกือบทุก DB ใช้)

Internal nodes เก็บแค่ key → เล็กกว่า → fit ใน memory ง่าย
Values อยู่ใน leaf nodes เท่านั้น
Leaf nodes ทุกตัว linked list ต่อกัน → range query = หา leaf แรก → walk linked list → เร็วมาก!

B+Tree ใน MySQL vs Postgres

ด้าน	MySQL (InnoDB)	PostgreSQL
Primary index	Clustered — leaf เก็บ PK + row data	ชี้ไป heap (tuple id)
Secondary index ชี้ไป	Primary key → ต้อง 2 lookups	Tuple id → 1 lookup
UPDATE ผลต่อ secondary	ไม่ต้อง update (ถ้า PK ไม่เปลี่ยน)	ต้อง update ทุก index (write amplification)

EXPLAIN ANALYZE — เครื่องมือเทพ

EXPLAIN ANALYZE SELECT * FROM grades ORDER BY g;

ผลลัพธ์: cost=startup..total rows=N width=B

cost: arbitrary unit (ไม่ใช่ ms!) — ใช้เปรียบเทียบ plans
rows: จำนวน row ที่คาดว่าจะคืน (จาก statistics)
width: ขนาดเฉลี่ยของ row เป็น byte

Tip: ต้องการ count คร่าว ๆ บนตารางใหญ่ → ใช้ค่า rows จาก EXPLAIN แทน SELECT COUNT(*)

Scan Types ใน Postgres

Scan Type	คำอธิบาย	เร็ว/ช้า
Seq Scan	อ่านทุก page ของ heap	ช้าสุด
Parallel Seq Scan	Seq scan หลาย worker	ช้าแต่เร็วกว่า single
Index Scan	ใช้ index → pointer → เปิด heap	เร็ว
Index Only Scan	ทุก column อยู่ใน index → ไม่ต้องเปิด heap	เร็วที่สุด
Bitmap Index Scan	สร้าง bitmap ของ pages ที่ต้องอ่าน	เร็ว หลายแถว
Bitmap Heap Scan	ใช้ bitmap ไปอ่าน heap	คู่กับ bitmap index

Bitmap Index Scan (ลึก)

สร้าง bitmap ([0,0,1,0,1,1,...]) แต่ละ bit = 1 page
ถ้า query มี 2 conditions AND → สร้าง 2 bitmaps แล้ว AND กัน
เปิดเฉพาะ pages ที่ bit = 1 → หลีกเลี่ยง random IO

Index Only Scan + VACUUM

ต้องการ Visibility Map ที่ up-to-date (สร้างโดย VACUUM)
เพิ่ง bulk insert → ลอง VACUUM ก่อนถ้า Index Only Scan ไม่เกิด

Composite (Multi-Column) Index

CREATE INDEX idx_emp_name_dept ON employees(name, department);

กฎ leftmost prefix:

WHERE name = 'X' → ใช้ได้
WHERE name = 'X' AND department = 'Y' → ใช้ได้
WHERE department = 'Y' → ใช้ไม่ได้ (name อยู่ซ้ายสุด)

จัดลำดับ column ใน composite index ตาม selectivity — column ที่ unique มากไว้ซ้ายสุด

Pitfalls ที่ทำให้ Index ใช้ไม่ได้

-- Function on column → Seq Scan
SELECT * FROM employees WHERE LOWER(name) = 'zs';
-- ทางแก้: CREATE INDEX ON employees(LOWER(name));
 
-- Wildcard นำหน้า → Seq Scan
SELECT id FROM employees WHERE name LIKE '%ZS%';
-- ทางแก้: Trigram index (pg_trgm) หรือ full-text search
 
-- Prefix-only LIKE → ใช้ index ได้
SELECT id FROM employees WHERE name LIKE 'ZS%';

Index Maintenance

ทุก INSERT → update index ทุกตัว
Index bloat: หลังหลาย UPDATE/DELETE → dead entries → query ช้า
ทางแก้: REINDEX INDEX CONCURRENTLY idx_name; (Postgres 12+, ไม่ lock)
Statistics outdated → planner ใช้ index ผิด → ANALYZE table_name;

SELECT * ทำไมแย่?

Network bandwidth — ส่ง column ที่ไม่ใช้
Memory — app โหลดทั้งหมด
Disk IO — TOAST column ต้อง decompress
Index Only Scan ใช้ไม่ได้ — ต้องเปิด heap เสมอ
Schema change risk — เพิ่ม column ใหม่อาจพัง

กฎทอง: ระบุ column ที่ต้องการเสมอ

Key Points

B+Tree = มาตรฐาน — leaf nodes linked → range query เร็ว
EXPLAIN ANALYZE = เครื่องมือเทพ — cost เป็น arbitrary unit ไม่ใช่ ms
Index Only Scan = เร็วสุด (ไม่ต้องเปิด heap) — ต้อง VACUUM ก่อน
Composite index ใช้ leftmost prefix rule
Function/wildcard นำหน้าใน WHERE → index ใช้ไม่ได้
index ไม่ฟรี — ทุก write ต้อง update index → อย่า index ทุก column
อย่า SELECT * — ระบุ column เสมอ

Database Internals — Pages, Heap, IO ที่ index ช่วยลด
PostgreSQL — EXPLAIN, scan types, VACUUM, HOT update
MySQL — Clustered index, InnoDB B+Tree
Database Engines — B-Tree vs LSM-Tree engines

Net's Digital Garden 🌱

รายการหน้า

Database Indexing

Database Indexing

Demo: ผลของ Index (11 ล้านแถว)

B-Tree vs B+Tree

B-Tree

B+Tree (เกือบทุก DB ใช้)

B+Tree ใน MySQL vs Postgres

EXPLAIN ANALYZE — เครื่องมือเทพ

Scan Types ใน Postgres

Bitmap Index Scan (ลึก)

Index Only Scan + VACUUM

Composite (Multi-Column) Index

Pitfalls ที่ทำให้ Index ใช้ไม่ได้

Index Maintenance

SELECT * ทำไมแย่?

Key Points

มุมมองกราฟ

สารบัญ

หน้าที่กล่าวถึง

Net's Digital Garden 🌱

รายการหน้า

Database Indexing

Database Indexing

Demo: ผลของ Index (11 ล้านแถว)

B-Tree vs B+Tree

B-Tree

B+Tree (เกือบทุก DB ใช้)

B+Tree ใน MySQL vs Postgres

EXPLAIN ANALYZE — เครื่องมือเทพ

Scan Types ใน Postgres

Bitmap Index Scan (ลึก)

Index Only Scan + VACUUM

Composite (Multi-Column) Index

Pitfalls ที่ทำให้ Index ใช้ไม่ได้

Index Maintenance

SELECT * ทำไมแย่?

Key Points

Related

มุมมองกราฟ

สารบัญ

หน้าที่กล่าวถึง