Replication & Sharding

Master database distribution patterns for scalable, fault-tolerant data storage systems

40 min read
Not Started

Database Distribution Calculator

1000 GB
3 replicas
4 shards

Storage Metrics

Total Storage Needed:3000 GB
Data Per Shard:250 GB
Storage Per Shard:750 GB

Performance Metrics

Write Latency:60 ms
Read Latency:20 ms
System Availability:100.000%

Scalability Metrics

Read Scalability:3x
Write Scalability:4x
Hotspot Risk:Low

Database Distribution Fundamentals

As applications scale, single database instances become bottlenecks. Database distribution through replication and sharding provides the foundation for building systems that can handle massive scale while maintaining performance and availability.

Replication

Creating copies of data across multiple nodes to improve availability, fault tolerance, and read performance.

Sharding

Partitioning data across multiple database instances to distribute load and enable horizontal scaling.

Replication Patterns

Master-Slave Replication

One master handles all writes, slaves serve reads. Simple and consistent, but master becomes a bottleneck.

✓ Advantages:
  • Strong consistency
  • Simple conflict resolution
  • Read scaling
# Write to master
INSERT INTO users ...
# Propagated to slaves
# Read from slaves
SELECT * FROM users ...

Master-Master Replication

Multiple masters accept writes, providing better write scalability but requiring conflict resolution.

⚠ Challenges:
  • Write conflicts
  • Complex consistency
  • Split-brain scenarios
# Master A
UPDATE user SET email = 'a@ex.com'
# Master B (conflict!)
UPDATE user SET email = 'b@ex.com'
# Conflict resolution needed

Leaderless Replication

No single master; clients write to multiple replicas. Uses quorum consensus for consistency.

📊 Quorum Formula:
R + W > N (for strong consistency)
# N=5, W=3, R=3
Write to 3 out of 5 replicas
Read from 3 out of 5 replicas
# Guaranteed consistency

Sharding Strategies

Hash-Based Sharding

Uses a hash function to determine data placement. Provides even distribution but makes range queries difficult.

# Python implementation
shard = hash(user_id) % num_shards
db = databases[shard]
db.insert(user_data)
✓ Pros:
  • Even data distribution
  • No hotspots
  • Simple implementation
✗ Cons:
  • Difficult range queries
  • Resharding complexity
  • No locality

Range-Based Sharding

Partitions data based on ranges of shard keys. Efficient for range queries but prone to hotspots.

# Shard mapping
Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000
✓ Pros:
  • Efficient range queries
  • Data locality
  • Intuitive partitioning
✗ Cons:
  • Hotspot potential
  • Uneven distribution
  • Sequential key issues

Directory-Based Sharding

Maintains a lookup service that maps keys to shards. Flexible but adds complexity and latency.

# Directory lookup
shard = directory.lookup(user_id)
db = databases[shard]
result = db.query(user_id)
✓ Pros:
  • Flexible mapping
  • Dynamic rebalancing
  • Complex strategies
✗ Cons:
  • Directory bottleneck
  • Added complexity
  • Single point of failure

Advanced Distribution Patterns

Consistent Hashing

Minimizes data movement when nodes are added or removed by placing both data and nodes on a hash ring.

# Virtual nodes for balance
virtual_nodes_per_server = 150
total_virtual_nodes = servers * 150
data_movement = 1/servers

Multi-Dimensional Sharding

Partitions data across multiple dimensions (e.g., geography + time) for better query performance.

# Geographic + temporal
shard = hash(region) + hash(date)
partition = shard % total_shards

Read Replicas with Write Sharding

Combines horizontal write scaling through sharding with horizontal read scaling through replication.

Master shards handle writes
Read replicas scale queries

Cross-Shard Transactions

Handling ACID transactions across multiple shards using two-phase commit or saga patterns.

⚠ Complexity increases significantly
• Two-phase commit protocol
• Distributed transaction coordinator
• Saga pattern for long transactions

Real-World Implementations

Instagram

• Hash-based sharding by user ID
• 4000+ database servers
• Custom consistent hashing
• Read replicas for photo serving

MongoDB

• Range or hash-based sharding
• Replica sets for fault tolerance
• Automatic balancing
• mongos routing layer

Cassandra

• Consistent hashing ring
• Configurable replication factor
• Tunable consistency (CL.ONE to CL.ALL)
• Virtual nodes for balance

YouTube

• Geographic sharding
• CDN for read scaling
• MySQL with Vitess
• Video metadata sharding

Discord

• Guild-based sharding
• Cassandra for messages
• Read replicas for guilds
• Hot partition migration

Uber

• Geographic sharding (city-based)
• MySQL with Schemaless
• Time-based partitioning
• Cross-region replication

Implementation Best Practices

🎯 Choose the Right Strategy

  • • Analyze query patterns first
  • • Consider data growth patterns
  • • Evaluate consistency requirements
  • • Plan for operational complexity

📊 Monitor Key Metrics

  • • Shard size distribution
  • • Query response times
  • • Replication lag
  • • Cross-shard query frequency

⚡ Optimize Performance

  • • Denormalize for single-shard queries
  • • Use connection pooling
  • • Implement query caching
  • • Batch cross-shard operations

🛡️ Plan for Failures

  • • Implement health checks
  • • Automate failover procedures
  • • Design for split-brain scenarios
  • • Test disaster recovery regularly

🔧 Operational Excellence

  • • Automate shard management
  • • Document runbooks clearly
  • • Train teams on procedures
  • • Use infrastructure as code

📈 Plan for Growth

  • • Design resharding procedures
  • • Implement gradual migrations
  • • Monitor capacity continuously
  • • Plan for 10x growth

📝 Replication & Sharding Quiz

1 of 5Current: 0/5

What is the primary purpose of database replication?