Twitter Snowflake ID Generator
Design a distributed unique ID generator like Twitter Snowflake: requirements, algorithms, and trade-offs.
Problem Statement
Design a system that can generate billions of unique IDs per day across multiple servers. The IDs should be sortable by time, compact for storage, and generated with high availability and low latency.
Key Requirements
- • Generate 10,000 unique IDs per second per server
- • IDs must be globally unique across all servers
- • IDs should be sortable by time (roughly ordered)
- • System should scale to 1000+ servers
- • Low latency (<1ms) and high availability (99.9%)
Step 1: Clarify Requirements
Functional Requirements
- •Uniqueness: No duplicate IDs across all servers and time
- •Sortability: IDs generated later should be larger (roughly)
- •Format: 64-bit integers for efficient storage/indexing
- •Numerical: Not strings (for performance)
Non-Functional Requirements
- •Scale: 10K IDs/second per server, 1000 servers
- •Latency: <1ms generation time
- •Availability: 99.9% uptime, fault tolerant
- •Independence: No coordination between servers
Step 2: Solution Approaches
❌ UUID (128-bit)
- Globally unique
- No coordination needed
- Decentralized generation
- 128 bits (too large)
- Not time-ordered
- String format issues
⚠️ Database Auto-increment
- Unique and sequential
- 64-bit integers
- Time-ordered
- Single point of failure
- Hard to scale
- DB bottleneck
✅ Snowflake Algorithm
- 64-bit integers
- Time-ordered
- Decentralized
- High performance
- Clock synchronization
- Machine ID management
- Limited lifetime (69 years)
Step 3: Snowflake Algorithm Design
64-bit Structure
Timestamp (41 bits)
- • Milliseconds since custom epoch (e.g., 2020-01-01)
- • 2^41 = ~69 years of IDs
- • Provides time ordering
- • Most significant bits for proper sorting
Machine ID + Sequence
- • 10 bits = 1024 unique machines
- • 12 bits = 4096 IDs per millisecond per machine
- • Combined: 4M IDs per second per machine
- • Sequence resets each millisecond
Algorithm Implementation
function generateId(): current_timestamp = current_time_millis() if current_timestamp < last_timestamp: throw ClockBackwardsException if current_timestamp == last_timestamp: sequence = (sequence + 1) & sequence_mask if sequence == 0: current_timestamp = wait_next_millis() else: sequence = 0 last_timestamp = current_timestamp return ((current_timestamp - epoch) << 22) | (machine_id << 12) | sequence
Step 4: System Architecture
High-Level Architecture
ID Generator Servers
- • Each server has unique machine ID
- • In-memory generation (no DB dependency)
- • Stateless and horizontally scalable
- • Handle 10K+ IDs per second
Machine ID Assignment
- • Configuration service assigns IDs
- • Static assignment at deployment
- • Range allocation for auto-scaling
- • Backup allocation for failover
Clock Synchronization
- • NTP for time synchronization
- • Handle clock backwards scenario
- • Wait for next millisecond when needed
- • Monitor clock drift alerting
High Availability
- • Multiple data centers
- • Health check endpoints
- • Circuit breaker patterns
- • Graceful degradation
Edge Cases & Solutions
Clock Goes Backwards
NTP correction or server restart causes clock to go backwards, potentially creating duplicate IDs.
Throw exception and wait until clock catches up to last timestamp, or use backup machine ID.
Sequence Overflow
More than 4096 IDs requested in single millisecond on one machine.
Wait for next millisecond or distribute load across multiple machines.
Machine ID Conflicts
Two servers accidentally get same machine ID due to configuration error.
Central registry service, automated deployment, health checks to detect conflicts.
Real-World Implementations
Twitter Snowflake
- • Original Snowflake implementation
- • 1 sign + 41 timestamp + 10 machine + 12 sequence
- • Used for Tweet IDs, User IDs
- • Open sourced the algorithm
- • Modified Snowflake for photo IDs
- • 41 time + 13 shard + 10 sequence
- • Uses PostgreSQL schemas as shards
- • Handles billions of photos
Discord
- • Snowflake for message IDs
- • Custom epoch (Discord launch date)
- • Enables message sorting by creation time
- • Handles billions of messages daily
Sony PlayStation Network
- • Modified for user activity tracking
- • Different bit allocation for longer lifespan
- • Multi-region deployment
- • High availability requirements
Key Takeaways
- • Snowflake algorithm is battle-tested at massive scale
- • Bit allocation can be customized based on requirements
- • Clock synchronization is critical for correctness
- • Machine ID management needs careful planning
- • Consider custom epoch to extend lifetime