Data & Messaging Systems · #56

Distributed ID Generation

Generate globally unique, time-ordered 64-bit IDs at scale without relying on a centralized database coordinator.

Published May 29, 2026 · By MortalApps · 4 min read · 871 words

TL;DR

Generate globally unique, time-ordered 64-bit IDs at scale without relying on a centralized database coordinator.
Structure IDs using a Snowflake-style layout (Timestamp + Machine ID + Sequence) to ensure rough monotonicity.
Mitigate clock skew risks by implementing strict NTP synchronization checks and sequence-hold strategies.
Avoid index fragmentation in relational databases by replacing random UUIDs with time-ordered sequential IDs.

Problem Idea Flow Examples Anti-patterns Tradeoffs Best Practices Related

The Problem

In a highly distributed, high-throughput system, generating unique identifiers using a single database's auto-incrementing primary key creates a catastrophic single point of failure and a severe write bottleneck.

Conversely, generating random UUIDs (UUIDv4) on client nodes solves the scaling problem but introduces severe database performance issues: their random nature destroys B-Tree index locality, leading to massive page fragmentation, high disk I/O, and degraded write performance.

Core System Idea

Distributed ID generation solves this by producing 64-bit integers that are globally unique, roughly time-ordered (monotonically increasing), and generated in a decentralized manner. The industry-standard pattern is the Snowflake ID architecture.

A Snowflake ID is a 64-bit integer divided into structured bit fields: 1. Sign Bit (1 bit): Unused (always 0) to ensure the ID is a positive integer. 2. Timestamp (41 bits): Millisecond resolution epoch offset, allowing the ID generator to operate for ~69 years. 3. Machine/Worker ID (10 bits): Uniquely identifies the specific generator node, supporting up to 1,024 concurrent nodes. 4. Sequence Number (12 bits): A local counter that increments for every ID generated within the same millisecond, resetting to 0 when the millisecond advances. This supports up to 4,096 IDs per millisecond per node.

Because the most significant bits represent time, Snowflake IDs are naturally ordered by creation time. This preserves B-Tree index insertion locality, preventing database index fragmentation while allowing independent nodes to generate IDs concurrently without network coordination.

System Flow

flowchart TD A["ID Generation Request"] --> B{"Get Current Millisecond"} B -->|"Same Millisecond"| C["Increment Local Sequence"] B -->|"New Millisecond"| D["Reset Sequence to 0"] C -->|"Sequence Overflow over 4095"| E["Wait for Next Millisecond"] D --> F["Assemble Bit Fields"] C -->|"Sequence OK"| F F -->|"Output"| G["64-bit Snowflake ID"]

Snowflake ID generation logic handling sequence increments, millisecond transitions, and overflow protection.

Real-World Examples Indicative

Twitter Snowflake — 100K IDs/sec with NTP skew protection

Twitter open-sourced Snowflake in 2010 to replace MySQL auto-increment tweet IDs, which had become a write bottleneck. Each Snowflake instance is a Thrift service: 41-bit millisecond timestamp (custom epoch: Nov 4, 2010) + 10-bit worker ID (assigned via ZooKeeper) + 12-bit sequence. At peak, ~20 Snowflake worker nodes generate ~100K unique IDs/sec cluster-wide, each capable of 4,096 IDs/ms. During a 2012 NTP clock correction that shifted a node's clock backward, Snowflake detected the skew (last_timestamp > current_timestamp), blocked generation for 45ms, and resumed cleanly — zero duplicate IDs were produced.

Discord — creation time embedded in ID, no created_at column

Discord generates Snowflake IDs for every message, channel, and server. The Discord epoch starts January 1, 2015 00:00:00 UTC. Clients extract message creation time with a right bit-shift: created_at = (id >> 22) + DISCORD_EPOCH — no database query needed. This eliminates the created_at column from their Cassandra message storage, saving ~8 bytes per row across 4B+ stored messages. Discord exposes this in their public API, allowing bots and tools to reconstruct message timelines purely from ID values without additional lookups.

Shopify order IDs — Redis-backed worker ID registry at 10K orders/sec

Shopify generates order IDs using a Snowflake-derived scheme with worker IDs registered in Redis. Each application pod claims a 10-bit worker ID slot from a Redis sorted set on startup, with a 30-second TTL heartbeat to hold the slot. During Black Friday 2022 at 10K+ order confirmations/sec across 10+ worker pods, a pod restart triggered automatic worker ID re-assignment: the new pod detected the expired TTL slot and claimed a fresh ID within 2 seconds — preventing any ID collision risk during the highest-traffic checkout window of the year.

Anti-Patterns

Using Random UUIDs (UUIDv4) as Clustered Index Keys

Inserting random 128-bit values forces the database to constantly re-order pages on disk, leading to severe write performance degradation.

Ignoring Clock Skew

If a generator node's system clock is set backward (e.g., during an NTP synchronization), it will generate duplicate IDs that collide with previously generated values.

Hardcoding Machine IDs

Assigning static machine IDs in containerized environments (like Kubernetes) leads to ID collisions if two pods spin up with the same identifier.

Using Centralized Locks for ID Generation

Relying on a central Redis or ZooKeeper lock to coordinate ID generation for every request defeats the purpose of a distributed generator and destroys throughput.

Design Tradeoffs

Dimension	Snowflake IDs (64-bit)	UUID v4 (128-bit)
Index performance	Roughly time-ordered; preserves B-Tree insertion locality and prevents page fragmentation and splits	Completely random; causes severe B-Tree index fragmentation and disk page rewrites on every insert
Coordination overhead	Requires a one-time machine ID assignment per node via ZooKeeper, Consul, or Redis on startup	Zero coordination required; each node generates IDs independently with no service dependency
Capacity and lifespan	4,096 IDs/ms per node for ~69 years (41-bit epoch); exhaustion requires re-epoching the generator	Virtually infinite lifespan and UUID collision probability is negligible at any practical scale

Best Practices

Implement Clock Skew ProtectionIf the system detects that the current time is behind the last recorded timestamp, refuse to generate IDs (throw an error or wait for the clock to catch up) to prevent duplicates.

Use Dynamic Machine ID AllocationUse a coordination service like ZooKeeper, Consul, or Etcd to dynamically assign unique Machine IDs to generator nodes when they boot up.

Extract Metadata from IDsDesign your application to extract the creation timestamp directly from the Snowflake ID, saving database storage by eliminating separate created_at columns where appropriate.

Use UUIDv7 for 128-bit RequirementsIf your system requires 128-bit IDs (e.g., for security or compliance), use UUIDv7, which replaces the random prefix with a millisecond-precision timestamp, preserving write locality.

Pre-allocate Machine IDs in CI/CDEnsure your deployment pipelines validate that no two active containers share the same machine ID configuration.

When to Use / Avoid

Use When	Avoid When
High-throughput distributed systems requiring unique, time-ordered primary keys.	Simple, single-instance applications where standard database auto-incrementing keys are sufficient.
Relational databases where index performance and write throughput are critical.	Systems where IDs must be completely unpredictable (e.g., password reset tokens or public API keys).
Applications that benefit from embedding creation timestamps directly inside the ID.	Environments where network coordination for Machine ID assignment is impossible.