← System Design Data & Messaging Systems
System Design

Consistent Hashing

Minimize data movement during cluster scaling by using consistent hashing instead of modulo hashing.

TL;DR
  • Minimize data movement during cluster scaling by using consistent hashing instead of modulo hashing.
  • Prevent load imbalance across physical nodes by mapping them to multiple virtual nodes on the hash ring.
  • Route requests with O(log N) lookup complexity using client-side or proxy-side ring routing.
  • Mitigate cache stampedes and hot spots by dynamically replicating highly requested keys across adjacent ring nodes.

The Problem

When scaling a cluster of caching servers (like Memcached) or distributed databases, traditional modulo hashing (hash(key) mod N) fails catastrophically when the number of nodes (N) changes. If a single node crashes or a new node is added, almost every key hashes to a different node.

This triggers a massive cache wipeout, forcing downstream databases to handle a sudden flood of read requests, which often leads to database exhaustion and system-wide cascading failures.

Core System Idea

Consistent hashing solves this by mapping both the keys and the physical nodes to a circular hash ring with a fixed hash space (e.g., 0 to 2^32 - 1). To find which node stores a key, the system hashes the key and traverses the ring clockwise until it encounters the first node whose hash is greater than or equal to the key's hash.

When a node is added or removed, only 1/N of the keys need to be remapped to different physical servers.

To prevent uneven distribution of keys (load imbalance) caused by non-uniform node distribution on the ring, consistent hashing introduces "virtual nodes" (vnodes). Each physical node is assigned multiple virtual nodes spread across the ring. This ensures that if a physical node fails, its load is distributed evenly among all remaining nodes rather than overloading a single clockwise neighbor.

System Flow

flowchart TD A["Key: Session_99"] -->|"Hash"| B["Hash Value: 150"] C["Hash Ring: 0 to 2^32"] -->|"Position 150"| D{"Clockwise Search"} D -->|"Finds Node at 180"| E["Virtual Node: NodeB_3"] E -->|"Map to Physical"| F["Physical Server B"] G["New Node Added"] -->|"Minimal Remap"| C

Consistent hashing ring routing a key to its clockwise virtual node, minimizing data movement during node additions.

Real-World Examples Indicative

Cassandra 128 vnodes with MurmurHash3 — ~10s gossip convergence

Apache Cassandra uses MurmurHash3 to place 128 virtual tokens per node on a hash ring ranging from -(2^63) to +(2^63)-1. When a new node joins a 10-node cluster, it claims ~128 of the 1,280 total tokens and receives only the data for those token ranges rather than a full copy. Ring convergence — all nodes agreeing on the new topology via gossip — completes in ~10 seconds for a 100-node cluster. DataStax benchmarks confirm that adding one node to a 10-node cluster transfers exactly ~10% of total data, matching the theoretical 1/N prediction and avoiding the full-cluster reshuffles that modulo hashing would require.

Discord Memcached — 150 vnodes preventing 92% cache invalidation

Discord's Memcached tier uses a consistent hashing client with 150 virtual nodes per physical server. Before switching from modulo hashing in 2017, a single server failure invalidated ~92% of cached game session data as nearly every key remapped to a different node. With 150 vnodes, a node failure now remaps only the 1/N keys owned by that node to its clockwise neighbors. During a 2022 major game-launch peak, this prevented the cascading database overload that the old modulo scheme would have triggered — only the affected node's key range hit the database as a cold-start miss.

DynamoDB adaptive capacity — automatic hot partition split

DynamoDB uses consistent hashing internally to distribute partition keys across storage nodes as virtual node ranges. When partition monitoring detects a hot partition receiving disproportionate traffic, adaptive capacity automatically remaps that virtual node range to a dedicated isolated node — no customer action required. During the 2020 AWS re:Invent workload surge, DynamoDB performed 3,000+ automatic hot partition splits per hour, redistributing read/write load without any customer-visible changes or key migrations, because each virtual node can be reassigned to a new physical node via a ring metadata update alone.

Anti-Patterns

Using Too Few Virtual Nodes

Configuring a low number of virtual nodes (e.g., fewer than 50 per physical node) leads to severe load imbalance, where some servers run out of memory while others sit idle.

Ignoring Physical Server Capacity

Treating servers with different CPU/RAM specifications equally on the ring causes weaker servers to crash under standard load.

Static Ring Configurations

Hardcoding the list of nodes in client applications prevents dynamic scaling and requires full application redeployments to add or remove servers.

Neglecting Ring State Synchronization

Allowing split-brain scenarios where different clients have different views of the hash ring leads to inconsistent writes and cache misses.

Design Tradeoffs

DimensionConsistent HashingModulo Hashing
Key remapping on resizeOnly 1/N keys remapped to clockwise neighbors; remaining cache stays valid across the clusterNearly 100% of keys remap to different nodes on any cluster size change, causing a full cache wipeout
Lookup complexityO(log N) binary search on a sorted ring; slight overhead in client or proxy ring traversalO(1); compute hash mod N — no ring data structure or sorted search required
Load uniformityWith 128+ virtual nodes per server, load variance across physical nodes stays within ~5% of averageUniform only when cluster size is fully static; any addition or removal destroys key distribution
Hot node failure blast radiusFailed node's load distributes across all servers proportionally via vnode spreadingAll traffic for the failed node's keys hits a single clockwise neighbor, potentially causing cascading overload

Best Practices

Configure 128 to 256 Virtual NodesThis range provides an optimal balance between uniform data distribution (typically within 5% of average) and ring traversal performance.
Scale Virtual Nodes by Hardware CapacityAssign more virtual nodes to physical servers with higher CPU and RAM capacities to ensure they take a proportionally larger share of the load.
Use Gossip Protocols for Ring StateImplement a decentralized gossip protocol (like Cassandra does) or a centralized coordinator (like ZooKeeper) to distribute ring changes reliably.
Implement Replication on the RingFor distributed databases, write data not just to the first clockwise node, but also to its N-1 clockwise physical neighbors to guarantee high availability.
Use MurmurHash3Use a fast, non-cryptographic hash function like MurmurHash3 to generate ring positions quickly with excellent distribution properties.

When to Use / Avoid

Use WhenAvoid When
Building large-scale distributed caching layers where node membership changes dynamically.The cluster size is completely static and nodes are never added or removed.
Designing distributed, partition-tolerant databases that require horizontal scaling.The dataset is small enough to fit on a single machine, making distribution unnecessary.
Implementing load balancers that require session affinity without storing session state.The application requires strict, global ordering of keys across all physical nodes.