Building a Key-Value Store
Design a distributed key-value store: partitioning, replication, consistency, and fault tolerance.
25 min read•Advanced
Not Started
Loading...
Problem Statement
Design a distributed key-value store that can handle massive scale with high availability. The system should support basic operations (GET, PUT, DELETE) while providing strong consistency or high availability based on configuration.
Key Requirements
- • Support billions of key-value pairs
- • Handle 10 million QPS with low latency (<10ms)
- • High availability (99.9%+ uptime)
- • Configurable consistency levels
- • Automatic scaling and failure handling
Step 1: Clarify Requirements
Functional Requirements
- •Basic Operations: PUT(key, value), GET(key), DELETE(key)
- •Key Size: Up to 256 bytes
- •Value Size: Up to 10KB
- •Data Types: Binary data (strings, JSON, etc.)
Non-Functional Requirements
- •Scale: 10 billion key-value pairs
- •QPS: 10 million reads, 1 million writes
- •Latency: <10ms P99 for reads, <20ms for writes
- •Availability: 99.9%+ uptime with tunable consistency
Step 2: Back-of-Envelope Estimation
Storage
10B keys × 256B = 2.56 TB
10B values × 5KB avg = 50 TB
Total: ~53 TB
With replication (3x): 159 TB
QPS Breakdown
Reads: 10M QPS
Writes: 1M QPS
Read:Write = 10:1
Peak: 2x average load
Bandwidth
Read: 10M × 5KB = 50 GB/s
Write: 1M × 5KB = 5 GB/s
Total: 55 GB/s
Need high-speed network
Step 3: High-Level Design
Core Components
Data Partitioning
- • Consistent hashing for even distribution
- • Virtual nodes to handle hot spots
- • Automatic rebalancing on node changes
Replication
- • N replicas (typically 3)
- • Replicas stored on different racks/AZs
- • Configurable consistency levels
Coordination
- • Gossip protocol for membership
- • Vector clocks for conflict resolution
- • Anti-entropy for repair
Client Interface
- • Load balancer for request routing
- • Client-side caching
- • Connection pooling
System Architecture
Client → Load Balancer → Consistent Hash Ring → Storage Nodes (N=3) → Anti-Entropy
Clients
Load Balancer
Hash Ring
Storage Nodes
Repair Service
Step 4: Deep Dive
Consistent Hashing Implementation
Virtual Nodes
- • Each physical node maps to 100-200 virtual nodes
- • Reduces hot spots when nodes join/leave
- • Better load distribution across the ring
- • Faster rebalancing with smaller data chunks
Key Benefits
- • Minimal data movement (K/N keys)
- • Automatic load balancing
- • Handles node failures gracefully
- • Scales horizontally
Replication & Consistency
Quorum Consensus
- • N = 3 (total replicas)
- • W = 2 (write quorum)
- • R = 2 (read quorum)
- • W + R > N guarantees consistency
Consistency Levels
- • Strong: W=N, R=1
- • Eventual: W=1, R=1
- • Balanced: W=2, R=2
- • Configurable per operation
Conflict Resolution
- • Vector clocks for causality
- • Last-write-wins for simplicity
- • Application-level resolution
- • Merkle trees for repair
Failure Detection & Recovery
Detection Mechanisms
- • Heartbeat messages every 500ms
- • Gossip protocol for membership
- • Phi accrual failure detector
- • Configurable timeout thresholds
Recovery Strategies
- • Temporary node: Route to healthy replicas
- • Permanent failure: Add replacement node
- • Hinted handoff for writes
- • Anti-entropy repair for consistency
CAP Theorem Trade-offs
CP System (Consistency + Partition Tolerance)
Config: W=N, R=1
Behavior: Blocks writes during partitions
Use case: Financial systems, user accounts
AP System (Availability + Partition Tolerance)
Config: W=1, R=1
Behavior: Always accepts reads/writes
Use case: Social media, content systems
Tunable Consistency
Config: W=2, R=2
Behavior: Balanced approach
Use case: Most applications, default choice
Real-World Implementations
Amazon DynamoDB
- • Managed key-value store by AWS
- • Consistent hashing with virtual nodes
- • Tunable consistency (eventual → strong)
- • Auto-scaling based on throughput
Redis Cluster
- • In-memory key-value store
- • Hash slots for sharding (16384 slots)
- • Master-slave replication
- • Client-side routing with redirects
Apache Cassandra
- • Wide-column store with key-value model
- • Consistent hashing ring topology
- • Tunable consistency levels
- • Gossip protocol for membership
Riak
- • Distributed key-value database
- • Based on Amazon Dynamo design
- • Vector clocks for conflict resolution
- • Multi-datacenter replication