Database Sharding
Scale writes horizontally by partitioning a single logical database across multiple physical database nodes.
- Scale writes horizontally by partitioning a single logical database across multiple physical database nodes.
- Select a high-cardinality shard key to prevent hotspot shards and ensure even data distribution.
- Avoid cross-shard queries by designing schemas that allow most operations to execute on a single shard.
- Use virtual sharding to minimize the operational cost and data movement during resharding events.
The Problem
When a relational database grows to terabytes of data, vertical scaling (adding CPU, RAM, and SSDs) hits a hard physical and financial ceiling. High write volumes exhaust disk I/O, lock contention degrades transaction performance, and index sizes exceed available memory, forcing slow disk reads.
Furthermore, operations like database backups, schema migrations, and index rebuilds take days to complete, creating massive operational risks and prolonged maintenance windows.
Core System Idea
Database sharding is a horizontal partitioning technique that splits a single logical database into multiple independent physical database instances (shards). Each shard contains a subset of the total data, and no shard has a complete view of the entire dataset.
To route queries, the system uses a deterministic "shard key" (e.g., user_id or tenant_id). An application routing layer or middleware hashes the shard key to determine which physical node holds the requested data.
By ensuring that writes and reads for a specific entity are routed to a single physical node, sharding allows write throughput and storage capacity to scale linearly with the number of nodes.
System Flow
Database routing layer hashing a shard key to distribute read and write queries across isolated physical database instances.
Real-World Examples Indicative
YouTube runs Vitess to manage horizontal MySQL sharding across thousands of nodes handling billions of video metadata rows. When a schema migration is needed (e.g., adding a column to the videos table), Vitess uses gh-ost to perform online migrations: gh-ost creates a shadow table, copies rows in chunks of 1,000, and applies binlog changes to the shadow table in parallel, then performs an atomic table swap. This allows YouTube to run schema changes on individual shards without any downtime or table locks — a migration that would take 8+ hours on a monolithic MySQL instance completes shard-by-shard in rolling 20-minute windows.
Instagram encodes the shard ID directly into each generated entity ID using a Snowflake-style layout: 41 bits timestamp, 13 bits shard ID, 10 bits sequence. This eliminates the need for a routing lookup table entirely — given any photo_id, the routing layer extracts bits 11–23 to determine the target shard in O(1) with zero network hops. Instagram operates 512 logical shards mapped across fewer physical PostgreSQL nodes; adding capacity only requires remapping logical shards to new physical nodes with no ID regeneration.
Shopify shards their MySQL cluster by shop_id, ensuring all orders, products, and inventory records for a merchant land on a single shard. This design allows the Black Friday checkout pipeline to sustain 80K+ checkouts/minute without cross-shard joins — an entire merchant's checkout flow, from cart lock to order confirmation, executes within a single physical database node. When a large merchant is projected to create a hot shard (e.g., a flash sale), Shopify pre-migrates that merchant's shard to a dedicated node cluster days before the event.
Anti-Patterns
Sharding by a key with few unique values (e.g., country_code or status) creates massive hotspot shards that overwhelm individual physical nodes.
Performing queries that join tables across different physical shards forces the application or routing layer to pull massive datasets over the network, destroying performance.
Implementing sharding before optimizing indexes, introducing read replicas, or partitioning tables locally adds massive architectural complexity without immediate benefit.
Mapping shard keys directly to physical IP addresses in application code makes resharding or node failover an operational nightmare.
Design Tradeoffs
| Dimension | Hash-Based Sharding | Range-Based Sharding |
|---|---|---|
| Write distribution | Distributes writes uniformly across all shards via hashing, minimizing the risk of hotspot nodes | High risk of write hotspots on the active shard (e.g., today's date shard receives all new writes) |
| Range query performance | Requires scatter-gather across all shards for range queries, with O(N shards) network fan-out | Groups related data by range onto the same shard, allowing efficient range scans with a single node query |
| Resharding complexity | Moving data requires rehashing a large percentage of keys unless consistent hashing is used to minimize churn | Simple; split the active range shard at a boundary into two new shards with no key remapping needed |
Best Practices
user_id, organization_id) to ensure uniform data and traffic distribution.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| Write throughput exceeds the physical limits of the largest available single database instance. | The dataset fits comfortably in the memory of a single large database instance. |
| Storage requirements grow beyond the capacity of a single physical disk array. | The application relies heavily on complex, multi-table joins and distributed ACID transactions. |
Multi-tenant applications where data can be naturally isolated by tenant_id. | The query patterns are highly dynamic and cannot be mapped to a single shard key. |