System Designer

What is Consistent Hashing?

Consistent hashing is a distributed hashing scheme that operates independently of the number of servers in a distributed hash table. It solves the problem of redistributing keys when nodes are added or removed from a cluster, minimizing the amount of data that needs to be moved. This makes it essential for building scalable distributed systems like caches, databases, and CDNs.

Developed at MIT in 1997, consistent hashing has become fundamental to distributed systems. Unlike traditional hashing where adding/removing nodes requires redistributing all data, consistent hashing only affects K/N keys on average (where K is total keys and N is number of nodes). This property makes it perfect for systems that need to scale dynamically.

Consistent Hashing Calculator

Physical Nodes: 5

Virtual Nodes per Physical: 150

Data Items: 10,000

Failed Nodes: 1

2000±163

Items/Node

20.0%

Data Moved

92%

Balance Score

O(log 750)

Lookup Time

Virtual Nodes: 750 total

Memory Overhead: 47KB

Failure Impact: 20.0% affected

How Consistent Hashing Works

1. Hash Ring Construction

Imagine a ring where hash values from 0 to 2^32-1 are distributed clockwise.

// Create hash ring
Ring: [0 ... 2^32-1] arranged in circle

// Hash nodes onto ring
Node A → hash("NodeA") = 1,234,567
Node B → hash("NodeB") = 2,345,678
Node C → hash("NodeC") = 3,456,789

2. Key Assignment

Keys are assigned to the first node clockwise from their hash position.

// Hash key and find owner
Key "user:123" → hash = 1,500,000

// Walk clockwise to find first node
1,500,000 → Next node: 2,345,678 (Node B)
Therefore: "user:123" belongs to Node B

3. Virtual Nodes

Each physical node gets multiple positions on the ring for better distribution.

Virtual Node Example

// Multiple hash positions per node
Node A:
  hash("NodeA:1") = 123,456
  hash("NodeA:2") = 987,654
  hash("NodeA:3") = 555,555
// 150+ virtual nodes for good distribution

4. Node Addition/Removal

Only keys between affected nodes need to move.

// Adding Node D between A and B
Before: A(1.2M) → B(2.3M)
After: A(1.2M) → D(1.8M) → B(2.3M)

// Only keys in range [1.2M, 1.8M] move
// From B to new Node D (~K/N keys)

Consistent vs Traditional Hashing

Traditional Hashing

hash(key) % N servers

Adding a node:
N=3 → N=4 changes 75% of mappings
Almost all data needs redistribution

Removing a node:
N=4 → N=3 changes 75% of mappings
Massive data movement required

❌ Poor for dynamic systems
❌ Cache invalidation storms
❌ High redistribution cost

Consistent Hashing

Keys map to ring positions

Adding a node:
Only ~K/N keys move
Minimal redistribution

Removing a node:
Only that node's keys move
To next node clockwise

✅ Perfect for elastic scaling
✅ Minimal cache invalidation
✅ Predictable data movement

Real-World Implementations

Amazon DynamoDB

Uses consistent hashing for data distribution across partitions.

• Virtual nodes for load balancing
• Preference lists for replication
• Automatic repartitioning
• Handles 10 trillion requests/day

Apache Cassandra

Distributes data across cluster using consistent hashing.

• Token ring architecture
• Virtual nodes (vnodes)
• Automatic data distribution
• Linear scalability

Memcached

Client libraries use consistent hashing for server selection.

• Ketama algorithm
• Client-side implementation
• Minimal cache misses on scaling
• Used by Facebook, Twitter

Akamai CDN

Content routing and server selection using consistent hashing.

• Global content distribution
• Edge server selection
• Load balancing at scale
• Handles 30% of web traffic

Implementation Example

Basic Consistent Hash Ring (Python)

Basic Consistent Hash Ring Implementation

import hashlib
import bisect

class ConsistentHash:
    def __init__(self, nodes=None, virtual_nodes=150):
        self.virtual_nodes = virtual_nodes
        self.ring = {}
        self.sorted_keys = []
        self.nodes = nodes or []
        
        for node in self.nodes:
            self.add_node(node)

    def _hash(self, key):
        """Generate hash for a key"""
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_node(self, node):
        """Add node with virtual nodes to the ring"""
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_value = self._hash(virtual_key)
            self.ring[hash_value] = node
            bisect.insort(self.sorted_keys, hash_value)

    def remove_node(self, node):
        """Remove node and its virtual nodes"""
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_value = self._hash(virtual_key)
            del self.ring[hash_value]
            self.sorted_keys.remove(hash_value)

    def get_node(self, key):
        """Find node responsible for key"""
        if not self.ring:
            return None
        
        hash_value = self._hash(key)
        # Find first node clockwise from hash
        idx = bisect.bisect_right(self.sorted_keys, hash_value)
        if idx == len(self.sorted_keys):
            idx = 0  # Wrap around
        return self.ring[self.sorted_keys[idx]]

# Usage
ch = ConsistentHash(['server1', 'server2', 'server3'])
print(ch.get_node('user:123'))  # Returns responsible server
ch.add_node('server4')  # Minimal data movement
ch.remove_node('server2')  # Only server2's data moves

Advanced Concepts

Jump Consistent Hash

Google's algorithm with no memory overhead.

• O(ln n) time complexity
• Zero memory for ring
• Deterministic placement
• Better for fixed node counts

Maglev Hashing

Google's load balancer consistent hashing.

• Minimal disruption
• Lookup table based
• Better load distribution
• Used in Google Cloud LB

Rendezvous Hashing

Highest Random Weight (HRW) algorithm.

• No ring structure
• Weighted node selection
• Simpler implementation
• Good for small clusters

Bounded Load

Consistent hashing with bounded loads.

• Prevents overload
• Capacity constraints
• Better balance
• Used in modern systems

When to Use Consistent Hashing

✅ Good Use Cases

• Distributed caching systems
• Load balancing with session affinity
• Distributed databases and storage
• Content delivery networks (CDN)
• Distributed hash tables (DHT)
• Service discovery and routing
• Sharding in microservices

❌ Not Suitable For

• Small, fixed number of nodes
• When perfect balance is critical
• Ordered data requirements
• Systems with no scaling needs
• When range queries are common
• Strict consistency requirements
• Simple round-robin sufficient

Consistent Hashing Best Practices

✓

Use enough virtual nodes:

150-200 virtual nodes per physical node for good distribution

✓

Monitor load distribution:

Track standard deviation and rebalance if needed

✓

Handle node failures gracefully:

Implement replica placement and failover strategies

✓

Use good hash functions:

MD5 or MurmurHash for uniform distribution

✓

Consider weighted nodes:

Assign more virtual nodes to powerful servers

No quiz questions available

Questions prop is empty