Building a Key-Value Store

Design a distributed key-value store: partitioning, replication, consistency, and fault tolerance.

25 min readAdvanced
Not Started
Loading...

Problem Statement

Design a distributed key-value store that can handle massive scale with high availability. The system should support basic operations (GET, PUT, DELETE) while providing strong consistency or high availability based on configuration.

Key Requirements

  • • Support billions of key-value pairs
  • • Handle 10 million QPS with low latency (<10ms)
  • • High availability (99.9%+ uptime)
  • • Configurable consistency levels
  • • Automatic scaling and failure handling

Step 1: Clarify Requirements

Functional Requirements

  • Basic Operations: PUT(key, value), GET(key), DELETE(key)
  • Key Size: Up to 256 bytes
  • Value Size: Up to 10KB
  • Data Types: Binary data (strings, JSON, etc.)

Non-Functional Requirements

  • Scale: 10 billion key-value pairs
  • QPS: 10 million reads, 1 million writes
  • Latency: <10ms P99 for reads, <20ms for writes
  • Availability: 99.9%+ uptime with tunable consistency

Step 2: Back-of-Envelope Estimation

Storage

10B keys × 256B = 2.56 TB
10B values × 5KB avg = 50 TB
Total: ~53 TB
With replication (3x): 159 TB

QPS Breakdown

Reads: 10M QPS
Writes: 1M QPS
Read:Write = 10:1
Peak: 2x average load

Bandwidth

Read: 10M × 5KB = 50 GB/s
Write: 1M × 5KB = 5 GB/s
Total: 55 GB/s
Need high-speed network

Step 3: High-Level Design

Core Components

Data Partitioning

  • • Consistent hashing for even distribution
  • • Virtual nodes to handle hot spots
  • • Automatic rebalancing on node changes

Replication

  • • N replicas (typically 3)
  • • Replicas stored on different racks/AZs
  • • Configurable consistency levels

Coordination

  • • Gossip protocol for membership
  • • Vector clocks for conflict resolution
  • • Anti-entropy for repair

Client Interface

  • • Load balancer for request routing
  • • Client-side caching
  • • Connection pooling

System Architecture

Client → Load Balancer → Consistent Hash Ring → Storage Nodes (N=3) → Anti-Entropy
Clients
Load Balancer
Hash Ring
Storage Nodes
Repair Service

Step 4: Deep Dive

Consistent Hashing Implementation

Virtual Nodes

  • • Each physical node maps to 100-200 virtual nodes
  • • Reduces hot spots when nodes join/leave
  • • Better load distribution across the ring
  • • Faster rebalancing with smaller data chunks

Key Benefits

  • • Minimal data movement (K/N keys)
  • • Automatic load balancing
  • • Handles node failures gracefully
  • • Scales horizontally

Replication & Consistency

Quorum Consensus

  • • N = 3 (total replicas)
  • • W = 2 (write quorum)
  • • R = 2 (read quorum)
  • • W + R > N guarantees consistency

Consistency Levels

  • • Strong: W=N, R=1
  • • Eventual: W=1, R=1
  • • Balanced: W=2, R=2
  • • Configurable per operation

Conflict Resolution

  • • Vector clocks for causality
  • • Last-write-wins for simplicity
  • • Application-level resolution
  • • Merkle trees for repair

Failure Detection & Recovery

Detection Mechanisms

  • • Heartbeat messages every 500ms
  • • Gossip protocol for membership
  • • Phi accrual failure detector
  • • Configurable timeout thresholds

Recovery Strategies

  • • Temporary node: Route to healthy replicas
  • • Permanent failure: Add replacement node
  • • Hinted handoff for writes
  • • Anti-entropy repair for consistency

CAP Theorem Trade-offs

CP System (Consistency + Partition Tolerance)

Config: W=N, R=1
Behavior: Blocks writes during partitions
Use case: Financial systems, user accounts

AP System (Availability + Partition Tolerance)

Config: W=1, R=1
Behavior: Always accepts reads/writes
Use case: Social media, content systems

Tunable Consistency

Config: W=2, R=2
Behavior: Balanced approach
Use case: Most applications, default choice

Real-World Implementations

Amazon DynamoDB

  • • Managed key-value store by AWS
  • • Consistent hashing with virtual nodes
  • • Tunable consistency (eventual → strong)
  • • Auto-scaling based on throughput

Redis Cluster

  • • In-memory key-value store
  • • Hash slots for sharding (16384 slots)
  • • Master-slave replication
  • • Client-side routing with redirects

Apache Cassandra

  • • Wide-column store with key-value model
  • • Consistent hashing ring topology
  • • Tunable consistency levels
  • • Gossip protocol for membership

Riak

  • • Distributed key-value database
  • • Based on Amazon Dynamo design
  • • Vector clocks for conflict resolution
  • • Multi-datacenter replication

📝 Case Study Quiz

Question 1 of 4

What is the primary benefit of using consistent hashing with virtual nodes in a distributed key-value store?