Multi-Region Architecture
Active-Active multi-region architectures require complex conflict resolution; physical clock-based resolution (LWW) causes data loss.
- Active-Active multi-region architectures require complex conflict resolution; physical clock-based resolution (LWW) causes data loss.
- Active-Passive architectures simplify writes by routing them to a single primary region, but introduce cross-region replication lag.
- Synchronous cross-region replication guarantees zero data loss (RPO=0) but severely degrades write latency due to the speed of light.
- CRDTs and operational transformation are the only safe ways to handle concurrent, multi-region writes without coordination.
The Problem
To survive the complete failure of an entire cloud provider region or to provide sub-millisecond latency to a global user base, systems must deploy across multiple geographical regions. However, the speed of light limits how fast data can travel between continents (e.g., transatlantic round-trip time is ~70ms).
If a system uses synchronous replication across regions to guarantee consistency, write latency skyrockets, rendering user-facing applications unusable. If the system replicates asynchronously to keep latency low, a region outage results in data loss (RPO > 0). Furthermore, if writes are accepted in multiple regions simultaneously (Active-Active), concurrent updates to the same record will conflict, leading to silent data corruption or divergent database states.
Core System Idea
Multi-region architectures must balance the trade-off between write latency, read freshness, and disaster recovery capabilities.
In an Active-Passive (Single-Primary) model, all writes are routed to a single "home" region. This region processes transactions locally and replicates changes asynchronously to passive "read replica" regions. This eliminates write conflicts but means users far from the primary region experience higher write latency, and passive regions may serve stale reads.
In an Active-Active (Multi-Primary) model, writes are accepted in any region. To prevent data divergence, the system must employ a deterministic conflict resolution strategy.
Because physical clocks drift across regions, relying on Last-Write-Wins (LWW) based on timestamps will arbitrarily delete valid updates. Instead, systems must use Conflict-Free Replicated Data Types (CRDTs)—such as state-based grow-only counters or observed-removed sets—or implement application-level orchestration to merge concurrent writes deterministically.
System Flow
In an Active-Passive multi-region setup, writes from the EU are proxied to the US primary region, while reads are served locally from the EU replica with asynchronous lag.
Real-World Examples Indicative
Spanner uses Google's TrueTime API — hardware clocks backed by GPS receivers and atomic clocks deployed in every Google datacenter — to achieve external consistency without distributed locks. TrueTime reports wall-clock time as an interval [earliest, latest] with a guaranteed maximum uncertainty of ±7ms. Spanner's commit protocol assigns transaction timestamps within the TrueTime interval and waits out the full uncertainty window (commit-wait) before declaring the transaction committed. This guarantees that any transaction starting after commit-wait observes the committed state. Google F1 (the database backing Google Ads) runs on Spanner across US, EU, and Asia, processing 40K+ transactions/sec with linearizable cross-region reads — without any clock synchronization protocol between regions.
Aurora Global Database uses proprietary storage-layer replication — not MySQL binlog replication — to stream physical redo log pages from the primary region to up to 5 secondary regions. At the storage layer, this replication achieves under 1 second of typical lag (300-700ms P99 under 50K writes/sec, measured in AWS re:Invent 2021 demos). During a simulated primary region failure, Aurora Global Database's managed failover promotes a secondary in under 60 seconds (RTO < 60s), compared to traditional MySQL replica promotion requiring 20-40 minutes of manual coordination. Slack uses Aurora Global Database between us-east-1 (primary) and eu-west-1 (secondary) for their workspace metadata store, relying on the <1s lag guarantee to implement their read-your-own-writes session routing logic.
Netflix runs Cassandra Active-Active across us-east-1, us-west-2, and eu-west-1. Viewing history writes are accepted in any region and replicated asynchronously (RF=3 per region, cross-region via Cassandra's multi-datacenter replication). Netflix uses Cassandra's Last-Write-Wins (LWW) conflict resolution based on WRITETIME() timestamps. However, since physical clocks drift between AWS regions by up to 10ms, LWW occasionally silently discards valid writes. Netflix mitigates this by routing writes for correctness-critical data (billing records, entitlements) exclusively to us-east-1 (Active-Passive), using Active-Active only for eventually-consistent data (viewing history, personalization signals) where LWW conflicts are invisible to users.
Anti-Patterns
Blocking an API request to perform a synchronous three-phase commit across US, EU, and Asia regions, resulting in multi-hundred-millisecond response times.
Resolving multi-region write conflicts using database-level Last-Write-Wins without a hardware-synchronized clock source (like TrueTime), leading to silent data loss during clock drift.
Routing a user's read request to a local secondary region immediately after they performed a write, causing the user to see stale data because the write has not yet replicated (violating read-your-own-writes consistency).
Treating multi-region disaster recovery as a theoretical capability without executing regular, automated region evacuation drills in production.
Design Tradeoffs
| Dimension | Active-Passive (Single-Primary) | Active-Active (Multi-Primary) |
|---|---|---|
| Write conflict risk | Zero; all writes route through a single primary region — no concurrent regional writes to reconcile | High; concurrent writes to the same record from different regions require deterministic conflict resolution via CRDTs or LWW |
| Write latency for remote users | High; users geographically distant from the primary pay the full cross-region round-trip latency on every write | Low; each user writes to their nearest local region, achieving local write latency regardless of geographic distribution |
| Failure recovery (RPO/RTO) | RPO > 0 during primary region failure; asynchronous replication lag means recent writes may be lost during failover | RPO = 0; all regions are live, so traffic is instantly rerouted to surviving regions with no data loss |
Best Practices
When to Use / Avoid
| Use When | Avoid When |
|---|---|
| You have strict high-availability SLAs (99.99%+) where the business cannot tolerate downtime even during a cloud provider's regional outage. | The application is a simple, low-traffic internal tool where a few hours of downtime during a disaster is acceptable. |
| Serving a globally distributed user base that requires sub-100ms response times for both reads and writes. | The budget is highly constrained; multi-region deployments double or triple infrastructure and data egress costs. |
| Building highly collaborative, event-driven systems where data can be naturally modeled as append-only event streams. | The system relies on legacy, monolithic relational databases that do not natively support distributed consensus or replication. |