Skip to main contentSkip to user menuSkip to navigation

Twitter Snowflake ID Generator

Design a distributed unique ID generator like Twitter Snowflake: requirements, algorithms, and trade-offs.

25 min readAdvanced
Not Started
Loading...

Problem Statement

Design a system that can generate billions of unique IDs per day across multiple servers. The IDs should be sortable by time, compact for storage, and generated with high availability and low latency.

Key Requirements

  • • Generate 10,000 unique IDs per second per server
  • • IDs must be globally unique across all servers
  • • IDs should be sortable by time (roughly ordered)
  • • System should scale to 1000+ servers
  • • Low latency (<1ms) and high availability (99.9%)

Step 1: Clarify Requirements

Functional Requirements

  • Uniqueness: No duplicate IDs across all servers and time
  • Sortability: IDs generated later should be larger (roughly)
  • Format: 64-bit integers for efficient storage/indexing
  • Numerical: Not strings (for performance)

Non-Functional Requirements

  • Scale: 10K IDs/second per server, 1000 servers
  • Latency: <1ms generation time
  • Availability: 99.9% uptime, fault tolerant
  • Independence: No coordination between servers

Step 2: Solution Approaches

❌ UUID (128-bit)

Pros:
  • Globally unique
  • No coordination needed
  • Decentralized generation
Cons:
  • 128 bits (too large)
  • Not time-ordered
  • String format issues

⚠️ Database Auto-increment

Pros:
  • Unique and sequential
  • 64-bit integers
  • Time-ordered
Cons:
  • Single point of failure
  • Hard to scale
  • DB bottleneck

✅ Snowflake Algorithm

Pros:
  • 64-bit integers
  • Time-ordered
  • Decentralized
  • High performance
Cons:
  • Clock synchronization
  • Machine ID management
  • Limited lifetime (69 years)

Step 3: Snowflake Algorithm Design

64-bit Structure

Sign
1 bit
Always 0
Timestamp
41 bits
Milliseconds since epoch
Machine ID
10 bits
1024 machines
Sequence
12 bits
4096 per ms

Timestamp (41 bits)

  • • Milliseconds since custom epoch (e.g., 2020-01-01)
  • • 2^41 = ~69 years of IDs
  • • Provides time ordering
  • • Most significant bits for proper sorting

Machine ID + Sequence

  • • 10 bits = 1024 unique machines
  • • 12 bits = 4096 IDs per millisecond per machine
  • • Combined: 4M IDs per second per machine
  • • Sequence resets each millisecond

Algorithm Implementation

function generateId():
  current_timestamp = current_time_millis()
  
  if current_timestamp < last_timestamp:
    throw ClockBackwardsException
    
  if current_timestamp == last_timestamp:
    sequence = (sequence + 1) & sequence_mask
    if sequence == 0:
      current_timestamp = wait_next_millis()
  else:
    sequence = 0
    
  last_timestamp = current_timestamp
  
  return ((current_timestamp - epoch) << 22) |
         (machine_id << 12) |
         sequence

Step 4: System Architecture

High-Level Architecture

Client → Load Balancer → ID Generator Servers → Return 64-bit ID
Clients
Load Balancer
ID Generators
64-bit IDs

ID Generator Servers

  • • Each server has unique machine ID
  • • In-memory generation (no DB dependency)
  • • Stateless and horizontally scalable
  • • Handle 10K+ IDs per second

Machine ID Assignment

  • • Configuration service assigns IDs
  • • Static assignment at deployment
  • • Range allocation for auto-scaling
  • • Backup allocation for failover

Clock Synchronization

  • • NTP for time synchronization
  • • Handle clock backwards scenario
  • • Wait for next millisecond when needed
  • • Monitor clock drift alerting

High Availability

  • • Multiple data centers
  • • Health check endpoints
  • • Circuit breaker patterns
  • • Graceful degradation

Edge Cases & Solutions

Clock Goes Backwards

Problem:

NTP correction or server restart causes clock to go backwards, potentially creating duplicate IDs.

Solution:

Throw exception and wait until clock catches up to last timestamp, or use backup machine ID.

Sequence Overflow

Problem:

More than 4096 IDs requested in single millisecond on one machine.

Solution:

Wait for next millisecond or distribute load across multiple machines.

Machine ID Conflicts

Problem:

Two servers accidentally get same machine ID due to configuration error.

Solution:

Central registry service, automated deployment, health checks to detect conflicts.

Real-World Implementations

Twitter Snowflake

  • • Original Snowflake implementation
  • • 1 sign + 41 timestamp + 10 machine + 12 sequence
  • • Used for Tweet IDs, User IDs
  • • Open sourced the algorithm

Instagram

  • • Modified Snowflake for photo IDs
  • • 41 time + 13 shard + 10 sequence
  • • Uses PostgreSQL schemas as shards
  • • Handles billions of photos

Discord

  • • Snowflake for message IDs
  • • Custom epoch (Discord launch date)
  • • Enables message sorting by creation time
  • • Handles billions of messages daily

Sony PlayStation Network

  • • Modified for user activity tracking
  • • Different bit allocation for longer lifespan
  • • Multi-region deployment
  • • High availability requirements

Key Takeaways

  • • Snowflake algorithm is battle-tested at massive scale
  • • Bit allocation can be customized based on requirements
  • • Clock synchronization is critical for correctness
  • • Machine ID management needs careful planning
  • • Consider custom epoch to extend lifetime
No quiz questions available
Quiz ID "unique-id-generator" not found