What is Consistent Hashing?
Consistent hashing is a distributed hashing scheme that operates independently of the number of servers in a distributed hash table. It solves the problem of redistributing keys when nodes are added or removed from a cluster, minimizing the amount of data that needs to be moved. This makes it essential for building scalable distributed systems like caches, databases, and CDNs.
Developed at MIT in 1997, consistent hashing has become fundamental to distributed systems. Unlike traditional hashing where adding/removing nodes requires redistributing all data, consistent hashing only affects K/N keys on average (where K is total keys and N is number of nodes). This property makes it perfect for systems that need to scale dynamically.
Consistent Hashing Calculator
Virtual Nodes: 750 total
Memory Overhead: 47KB
Failure Impact: 20.0% affected
How Consistent Hashing Works
1. Hash Ring Construction
Imagine a ring where hash values from 0 to 2^32-1 are distributed clockwise.
Ring: [0 ... 2^32-1] arranged in circle
// Hash nodes onto ring
Node A → hash("NodeA") = 1,234,567
Node B → hash("NodeB") = 2,345,678
Node C → hash("NodeC") = 3,456,789
2. Key Assignment
Keys are assigned to the first node clockwise from their hash position.
Key "user:123" → hash = 1,500,000
// Walk clockwise to find first node
1,500,000 → Next node: 2,345,678 (Node B)
Therefore: "user:123" belongs to Node B
3. Virtual Nodes
Each physical node gets multiple positions on the ring for better distribution.
// Multiple hash positions per node
Node A:
hash("NodeA:1") = 123,456
hash("NodeA:2") = 987,654
hash("NodeA:3") = 555,555
// 150+ virtual nodes for good distribution
4. Node Addition/Removal
Only keys between affected nodes need to move.
Before: A(1.2M) → B(2.3M)
After: A(1.2M) → D(1.8M) → B(2.3M)
// Only keys in range [1.2M, 1.8M] move
// From B to new Node D (~K/N keys)
Consistent vs Traditional Hashing
Traditional Hashing
hash(key) % N servers
N=3 → N=4 changes 75% of mappings
Almost all data needs redistribution
N=4 → N=3 changes 75% of mappings
Massive data movement required
❌ Cache invalidation storms
❌ High redistribution cost
Consistent Hashing
Keys map to ring positions
Only ~K/N keys move
Minimal redistribution
Only that node's keys move
To next node clockwise
✅ Minimal cache invalidation
✅ Predictable data movement
Real-World Implementations
Amazon DynamoDB
Uses consistent hashing for data distribution across partitions.
- • Virtual nodes for load balancing
- • Preference lists for replication
- • Automatic repartitioning
- • Handles 10 trillion requests/day
Apache Cassandra
Distributes data across cluster using consistent hashing.
- • Token ring architecture
- • Virtual nodes (vnodes)
- • Automatic data distribution
- • Linear scalability
Memcached
Client libraries use consistent hashing for server selection.
- • Ketama algorithm
- • Client-side implementation
- • Minimal cache misses on scaling
- • Used by Facebook, Twitter
Akamai CDN
Content routing and server selection using consistent hashing.
- • Global content distribution
- • Edge server selection
- • Load balancing at scale
- • Handles 30% of web traffic
Implementation Example
Basic Consistent Hash Ring (Python)
import hashlib
import bisect
class ConsistentHash:
def __init__(self, nodes=None, virtual_nodes=150):
self.virtual_nodes = virtual_nodes
self.ring = {}
self.sorted_keys = []
self.nodes = nodes or []
for node in self.nodes:
self.add_node(node)
def _hash(self, key):
"""Generate hash for a key"""
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node):
"""Add node with virtual nodes to the ring"""
for i in range(self.virtual_nodes):
virtual_key = f"{node}:{i}"
hash_value = self._hash(virtual_key)
self.ring[hash_value] = node
bisect.insort(self.sorted_keys, hash_value)
def remove_node(self, node):
"""Remove node and its virtual nodes"""
for i in range(self.virtual_nodes):
virtual_key = f"{node}:{i}"
hash_value = self._hash(virtual_key)
del self.ring[hash_value]
self.sorted_keys.remove(hash_value)
def get_node(self, key):
"""Find node responsible for key"""
if not self.ring:
return None
hash_value = self._hash(key)
# Find first node clockwise from hash
idx = bisect.bisect_right(self.sorted_keys, hash_value)
if idx == len(self.sorted_keys):
idx = 0 # Wrap around
return self.ring[self.sorted_keys[idx]]
# Usage
ch = ConsistentHash(['server1', 'server2', 'server3'])
print(ch.get_node('user:123')) # Returns responsible server
ch.add_node('server4') # Minimal data movement
ch.remove_node('server2') # Only server2's data moves
Advanced Concepts
Jump Consistent Hash
Google's algorithm with no memory overhead.
• Zero memory for ring
• Deterministic placement
• Better for fixed node counts
Maglev Hashing
Google's load balancer consistent hashing.
• Lookup table based
• Better load distribution
• Used in Google Cloud LB
Rendezvous Hashing
Highest Random Weight (HRW) algorithm.
• Weighted node selection
• Simpler implementation
• Good for small clusters
Bounded Load
Consistent hashing with bounded loads.
• Capacity constraints
• Better balance
• Used in modern systems
When to Use Consistent Hashing
✅ Good Use Cases
- • Distributed caching systems
- • Load balancing with session affinity
- • Distributed databases and storage
- • Content delivery networks (CDN)
- • Distributed hash tables (DHT)
- • Service discovery and routing
- • Sharding in microservices
❌ Not Suitable For
- • Small, fixed number of nodes
- • When perfect balance is critical
- • Ordered data requirements
- • Systems with no scaling needs
- • When range queries are common
- • Strict consistency requirements
- • Simple round-robin sufficient
Consistent Hashing Best Practices
150-200 virtual nodes per physical node for good distribution
Track standard deviation and rebalance if needed
Implement replica placement and failover strategies
MD5 or MurmurHash for uniform distribution
Assign more virtual nodes to powerful servers