Read/Write Patterns at Scale
Scale read capacity by routing queries to read replicas while managing replication lag expectations.
- Scale read capacity by routing queries to read replicas while managing replication lag expectations.
- Absorb massive write spikes using write-behind caching backed by durable message queues.
- Mitigate hot-key bottlenecks by implementing cache leases or local memory caching.
- Guarantee read-your-writes consistency by routing post-write reads to the primary database.
The Problem
As application traffic scales, databases become the ultimate bottleneck. Simply adding more application servers does nothing to solve database CPU exhaustion caused by millions of read queries, nor does it prevent write lock contention when thousands of users update the same database rows simultaneously.
Furthermore, naive caching strategies introduce race conditions, stale data, and cache stampedes that crash downstream databases the moment a popular cache key expires.
Core System Idea
Scaling reads and writes requires decoupling the data access patterns using specialized architectural patterns.
For read-heavy workloads, we implement Read Replicas. Writes go to a single primary database, which asynchronously replicates changes to multiple read-only instances. To prevent users from seeing stale data immediately after they make an update (due to replication lag), we use "read-your-writes" consistency routing: reads occurring within a short window after a write are routed directly to the primary database, while all other reads go to replicas.
For write-heavy workloads, we use Write-Behind (Write-Back) Caching. The application writes directly to a high-speed cache layer and immediately returns success to the user. An asynchronous worker pool then pulls these write events from a queue and batches them into the primary database.
To handle "hot keys" (highly popular items like viral posts), we use Cache Leases (or single-flight mutexes) to ensure only one application thread queries the database to populate the cache, while other threads wait for the cached result.
System Flow
Read-replica routing with write-behind caching to decouple read and write paths under high load.
Real-World Examples Indicative
Twitter's home timeline uses fan-out-on-write for accounts with fewer than ~20M followers: when a user tweets, the tweet ID is pre-written to each follower's Redis timeline cache. For high-follower accounts (>20M), Twitter switches to fan-out-on-read: the celebrity's tweets are not pre-fanned; instead, a timeline read merges the pre-computed cache with a real-time pull from the celebrity's tweet list. Without this hybrid model, a single tweet from a 100M-follower account would trigger 100M Redis writes within seconds — a write amplification Twitter measured as 100x the infrastructure cost of a normal user tweet.
Facebook's TAO (The Associations and Objects) cache layer sits in front of MySQL for social graph data. To prevent stampedes on hot nodes (e.g., a viral post), TAO's mcrouter implements a lease mechanism: when a cache miss occurs, the first requesting client receives a 10ms exclusive lease token. All other clients requesting the same key during that window receive a "wait" response and retry after the lease holder has re-populated the cache from MySQL. This single lease-per-key guarantee collapses thousands of parallel DB fan-out queries into one, protecting MySQL from the thundering herd that a standard TTL expiry would cause on hot social objects.
Instagram processes like counts using a write-behind pattern. When a user likes a photo, the like count is incremented atomically in Redis with INCR photo:{id}:likes. An async worker pool reads deltas from Redis every 30 seconds and bulk-writes them to PostgreSQL with UPDATE photos SET likes_count = likes_count + $delta WHERE id = $id. This allows Instagram to absorb 10K+ likes/sec on viral posts without hitting PostgreSQL directly — a rate that would exhaust PostgreSQL row-lock capacity on a single high-traffic photo row.
Anti-Patterns
Directing a user to a page that reads from a replica immediately after they submitted a form leads to confusing "missing data" bugs due to replication lag.
Writing directly to an in-memory cache and relying on an in-memory background thread to update the database risks total data loss if the application container crashes.
Deleting cache keys without using version tokens or CAS (Compare-And-Swap) operations leads to race conditions where stale database values overwrite fresh cache data.
Implementing fan-out-on-write for users with millions of followers (the "celebrity problem") causes massive write spikes and system-wide latency.
Design Tradeoffs
| Dimension | Fan-Out on Write (Push) | Fan-Out on Read (Pull) |
|---|---|---|
| Read latency | O(1) lookup from a pre-computed per-user cache; feeds are served without any query-time computation | Slower; requires fetching and merging data from multiple sources at query time |
| Write amplification | High; a single write is duplicated across every follower's cache — catastrophic at 100M+ followers | Low; writes touch a single location; fan-out cost is deferred to read time |
| Storage and consistency | High storage and complex cache invalidation for inactive users who may never read their pre-computed feed | Simple storage; data is computed and cached only when actually requested by an active user |
Best Practices
INSERT INTO ... VALUES ...) to maximize database write throughput.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| The application has a high read-to-write ratio (e.g., e-commerce catalogs, social media feeds). | The application requires strict, immediate ACID consistency across all read operations. |
| Handling massive, unpredictable spikes in writes (e.g., IoT sensor data, real-time gaming telemetry). | Writes must be immediately validated against complex database constraints before returning success. |
| Serving highly repetitive queries that can be naturally cached near the user. | The dataset is highly dynamic, unique per request, and rarely read more than once. |