Database Internals

โครงสร้างพื้นฐานภายใน database ที่ต้องเข้าใจก่อนคุยเรื่อง index/query optimization — ทุกอย่างเริ่มจาก Pages และ IO

Page (หน่วยพื้นฐานของ I/O)

กฎเหล็ก: Database ไม่อ่านทีละ row — มันอ่านทีละ page (block ขนาดคงที่)

Database	Page Size
PostgreSQL	8 KB
MySQL (InnoDB)	16 KB
MongoDB (WiredTiger)	32 KB
SQL Server	8 KB
Oracle	8 KB

1 page เก็บได้หลาย rows (depends on row size)
ขอ row 1 row → DB อ่าน 1 page ทั้ง page (row อื่นในเดียวกันได้ “ฟรี”)
ตาราง row เล็ก = density ดี = อ่าน page น้อย = เร็วกว่า

IO (Input/Output Operation)

IO = สกุลเงินของ database — ทุกอย่างที่เราทำคือพยายามลด IO

1 IO = อ่าน 1 page (หรือหลาย pages)
IO ไม่สามารถอ่าน “1 row” แยก — อ่าน “1 page” เสมอ
บาง IO ไป OS cache (เร็ว ~microsec) บางอันลง disk จริง (ช้ากว่า 100-1000x)

Storage	Random Read Latency
HDD	~10 ms
SSD	~0.1 ms
RAM	~0.0001 ms

Row ID (Tuple ID)

Database	Row ID
MySQL (InnoDB)	Primary key = row id (clustered index)
PostgreSQL	`ctid` (tuple id) แยกจาก PK

ผลกระทบ:

Postgres: UPDATE row → สร้าง tuple ใหม่ → ctid เปลี่ยน → secondary index ทุกอัน update (write amplification)
MySQL: UPDATE non-PK column → secondary index ไม่ต้องเปลี่ยน

Heap (ที่อยู่จริงของข้อมูล)

Heap = data structure ที่เก็บ table ทั้งหมด page ต่อ page เรียงกัน

ข้อมูลจริง ๆ (ทุก column) อยู่ใน heap
Traverse heap แพง — ต้องอ่านหลาย pages
จึงต้องมี index มาบอกว่า page ไหนต้องดึง

Index (ทางลัดไปยัง heap)

Index = data structure แยกจาก heap ที่มี pointers ชี้กลับไป heap

เก็บ subset ของ data (เฉพาะ indexed columns + pointer)
index เองก็เก็บเป็น pages → ต้องใช้ IO เพื่ออ่าน
ยิ่ง index เล็ก → ยิ่ง fit ใน memory → search เร็ว
Data structure ที่นิยมที่สุด: B+Tree

Flow การ query แบบใช้ index:

IO ที่ 1: อ่าน index page → หา key → ได้ pointer
IO ที่ 2: อ่าน heap page ตาม pointer → ได้ row จริง

ถ้าไม่มี index → full table scan = อ่านทุก page ของ heap

index ไม่ใช่ “ของฟรี” — ทุก INSERT/UPDATE/DELETE ต้อง update index ด้วย → write ช้าลง

Buffer Pool / Shared Buffers

[Application] <-> [DB Process] <-> [Buffer Pool (RAM)] <-> [Disk]

DB เก็บ pages ที่ใช้บ่อยใน RAM (“buffer pool”)
READ → check buffer pool ก่อน → cache miss → อ่าน disk → load เข้า buffer pool
WRITE → modify ใน buffer pool → mark dirty → flush ลง disk แบบ async (with WAL)

Database	Parameter	ค่าแนะนำ
PostgreSQL	`shared_buffers`	25% ของ RAM
MySQL InnoDB	`innodb_buffer_pool_size`	50-70% ของ RAM

Pitfall: ตั้ง buffer pool ใหญ่เกิน → swap → ช้ายิ่งกว่า disk จริง

Key Points

Database อ่านทีละ page ไม่ใช่ row — Postgres 8KB, MySQL 16KB, MongoDB 32KB
IO = สกุลเงินของ DB — ทุกการ optimize = ลด IO
Heap เก็บข้อมูลจริง, Index เป็นทางลัดชี้ไป heap
MySQL ใช้ PK เป็น row id (clustered), Postgres ใช้ ctid แยก
Buffer Pool = cache pages ใน RAM — ตั้งค่าให้เหมาะ ไม่ใหญ่เกิน

Database Indexing — B+Tree, EXPLAIN, scan types
PostgreSQL — heap-organized, 8KB pages, MVCC
MySQL — clustered index, 16KB pages
WAL — durability ของ dirty pages ใน buffer pool
Database Engines — engine ต่าง ๆ จัดการ internals ต่างกัน

Net's Digital Garden 🌱

รายการหน้า

Database Internals

Database Internals

Page (หน่วยพื้นฐานของ I/O)

IO (Input/Output Operation)

Row ID (Tuple ID)

Heap (ที่อยู่จริงของข้อมูล)

Index (ทางลัดไปยัง heap)

Buffer Pool / Shared Buffers

Key Points

มุมมองกราฟ

สารบัญ

หน้าที่กล่าวถึง

Net's Digital Garden 🌱

รายการหน้า

Database Internals

Database Internals

Page (หน่วยพื้นฐานของ I/O)

IO (Input/Output Operation)

Row ID (Tuple ID)

Heap (ที่อยู่จริงของข้อมูล)

Index (ทางลัดไปยัง heap)

Buffer Pool / Shared Buffers

Key Points

Related

มุมมองกราฟ

สารบัญ

หน้าที่กล่าวถึง