Data Partitioning Strategies
Prevent full table scans on multi-terabyte datasets by partitioning tables into smaller, physical files.
- Prevent full table scans on multi-terabyte datasets by partitioning tables into smaller, physical files.
- Select partitioning keys that align with your query patterns to enable partition pruning.
- Avoid partition skew by monitoring and adjusting boundaries to prevent single disks from running out of space.
- Keep partition counts balanced to avoid metadata overhead and query planner degradation.
The Problem
As database tables grow into hundreds of millions of rows, query performance degrades exponentially. Even with proper indexing, B-Tree index depths increase, requiring more disk I/O operations per lookup. Maintenance tasks like rebuilding indexes, deleting old historical data, or running analytical queries require massive table locks and full table scans.
This leads to query timeouts, high CPU utilization, and operational gridlock where database administrators cannot safely modify schemas or purge expired data.
Core System Idea
Data partitioning is the process of physically dividing a single logical table into smaller, self-contained physical files (partitions) on disk, while presenting a single unified table interface to the application. The database engine automatically routes writes and reads to the appropriate partition based on a "partition key."
The primary performance benefit of partitioning is "partition pruning." When a query contains a filter on the partition key, the query optimizer analyzes the partition metadata and scans only the specific physical partitions that could contain the matching data, completely ignoring the rest of the table.
This keeps index sizes small, limits disk I/O, and allows operations like purging old data to be executed instantly by dropping an entire physical partition (e.g., DROP PARTITION) rather than running expensive, lock-heavy DELETE queries.
System Flow
Query optimizer executing partition pruning to scan only the relevant physical partition file, bypassing unrelated data.
Real-World Examples Indicative
Uber's trips table (billions of rows) is range-partitioned by created_at month, with each monthly partition at ~200GB. Partition pruning means driver earnings reports (WHERE created_at >= '2023-10-01') scan only 1-2 partitions instead of the full table. Old partitions older than 3 years are dropped with ALTER TABLE trips DROP PARTITION — a metadata-only operation completing in milliseconds, replacing the previous approach of running DELETE queries that locked the table and took hours to complete at billion-row scale.
Shopify's products table is list-partitioned by shop_id hash bucket (each bucket groups ~5,000 merchant shops). When a merchant queries their catalog (WHERE shop_id = 12345), the query optimizer resolves the bucket and scans only that partition's local B-Tree index (~1GB) rather than the full 50TB products table. During Black Friday 2022, Shopify partition-pruned 98%+ of disk I/O on catalog reads — a key enabler for sustaining 80K+ checkouts/min without adding horizontal database shards.
Athena reads data from S3 using Hive-style directory partitioning: /year=2023/month=10/day=15/. A query filtering WHERE year=2023 AND month=10 lists only the October directory (~50 Parquet files) instead of scanning the full multi-petabyte dataset. This reduces query cost from $5/TB scanned to under $0.01 for date-ranged queries. Cloudflare uses this pattern to let customers query 90 days of edge access logs: each query scans only the requested date partitions, not the full 90-day corpus.
Anti-Patterns
Partitioning a table by a unique identifier like user_id or transaction_id creates thousands of tiny partitions, bloating database metadata and destroying query planner performance.
Writing queries that do not filter on the partition key forces the database engine to perform a "scatter-gather" scan across every single partition, which is slower than scanning an unpartitioned table.
Creating daily partitions for a table that only receives a few thousand rows per day results in excessive file system overhead and inefficient disk space utilization.
Partitioning by a key that causes 90% of the data to land in a single partition (e.g., partitioning by status where 'active' is the dominant state) defeats the purpose of partitioning.
Design Tradeoffs
| Dimension | Range Partitioning | Hash Partitioning |
|---|---|---|
| Query optimization | Ideal for time-series and date-range queries; the optimizer prunes to a small set of partitions matching the filter | Inefficient for range queries; all partitions must be scanned since hash values are not contiguous |
| Write distribution | High risk of write hotspots on the active range partition (e.g., today's date partition receives all new inserts) | Distributes writes evenly across a fixed number of partitions using a hash function, minimizing hotspots |
| Data lifecycle management | Simple; drop an entire time-range partition to purge old data in milliseconds with no WAL overhead | Complex; deleting aged data requires row-level DELETE queries across all hash partitions |
Best Practices
WHERE clauses (typically a timestamp or tenant identifier).pg_partman in PostgreSQL) or scheduled cron jobs to pre-create future partitions before writes arrive.DELETE FROM table WHERE created_at < '2022-01-01', use ALTER TABLE table DROP PARTITION p2021 to instantly reclaim disk space without generating WAL overhead or table locks.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| Single tables exceed 100GB in size or contain tens of millions of rows. | The total table size is under 10GB; indexes alone are highly efficient at this scale. |
| The dataset has a natural temporal lifecycle (e.g., logs, metrics, or transactional history that is archived after N days). | The query patterns are highly dynamic and rarely filter on a consistent column. |
| You need to run heavy analytical queries alongside transactional writes without resource contention. | The primary key must be globally unique and enforced across partitions without including the partition key. |