System Designer

Beyond Basic Scaling

When systems grow beyond millions of users, simple horizontal and vertical scaling isn't enough. You need sophisticated patterns for data distribution, consistency management, and failure handling. This lesson covers the advanced scaling techniques used by companies like Google, Amazon, and Facebook to handle billions of requests daily.

Key Principle: At scale, everything fails. Design for failure, partition tolerance, and graceful degradation from day one.

Distributed System Calculator

Target Throughput: 10,000 RPS

Data Size: 100 GB

Consistency Level

Replication Factor: 3

Nodes Required

15ms

Expected Latency

300GB

Total Storage

$230

Monthly Cost

Availability: 99.99%

Write Amplification: 3x

Network: 3 Mbps

CAP Theorem & Trade-offs

What is the CAP Theorem?

The CAP theorem states that distributed systems can only guarantee 2 out of 3 properties simultaneously: Consistency, Availability, and Partition tolerance.

Since network partitions are inevitable in distributed systems, the practical choice becomes: Consistency or Availability? This fundamental trade-off shapes how we design large-scale systems.

Consistency

All nodes see the same data simultaneously

• Linearizability
• Sequential consistency
• Causal consistency
• Read-your-writes

Availability

System remains operational at all times

• 99.999% uptime
• No single point of failure
• Graceful degradation
• Fast failover

Partition Tolerance

System continues despite network failures

• Network splits
• Message loss
• Node isolation
• Async replication

CP Systems (Consistency + Partition Tolerance)

Sacrifice availability for consistency during network partitions.

Examples:

• MongoDB
• Redis (with persistence)
• HBase
• Zookeeper

Use Cases:

• Financial transactions
• Inventory management
• Configuration management
• Leader election

AP Systems (Availability + Partition Tolerance)

Sacrifice consistency for availability during network partitions.

Examples:

• Cassandra
• DynamoDB
• CouchDB
• Riak

Use Cases:

• Social media feeds
• Shopping carts
• User sessions
• Content delivery

Database Sharding Strategies

Range-Based Sharding

Data is partitioned based on ranges of the shard key.

// Users sharded by ID range
Shard 1: user_id 0-999,999
Shard 2: user_id 1,000,000-1,999,999
Shard 3: user_id 2,000,000-2,999,999

Pros:

• Simple implementation
• Efficient range queries
• Easy to understand

Cons:

• Uneven distribution
• Hot spots possible
• Manual rebalancing

Hash-Based Sharding

Use a hash function to determine shard placement.

// Consistent hashing example
shard_id = hash(user_id) % num_shards
// user_123 → hash(123) % 4 = shard_2
// user_456 → hash(456) % 4 = shard_0

Pros:

• Even distribution
• No hot spots
• Automatic balancing

Cons:

• No range queries
• Resharding is complex
• Related data scattered

Geographic Sharding

Data is partitioned based on geographic location.

// Sharding by region
US-East: users.country IN ('US') AND state IN ('NY','MA'...)
US-West: users.country IN ('US') AND state IN ('CA','OR'...)
Europe: users.country IN ('UK','DE','FR'...)

Pros:

• Low latency for users
• Data sovereignty
• Regional compliance

Cons:

• Cross-region queries slow
• Uneven growth
• Complex failover

Replication Strategies

Master-Slave Replication

One master handles writes, slaves handle reads.

Master (Write) ──┬── Slave 1 (Read)
├── Slave 2 (Read)
└── Slave 3 (Read)

Characteristics:

• Simple to implement
• Read scaling only
• Single point of failure
• Replication lag issues

Master-Master Replication

Multiple masters can handle writes with conflict resolution.

Master 1 ←→ Master 2
↑ ↑
↓ ↓
Master 3 ←→ Master 4

Characteristics:

• Write scaling
• No single point of failure
• Conflict resolution needed
• Complex consistency

Quorum-Based Replication

Read/write from majority of nodes for consistency.

W + R > N (Strong consistency)
W = 3, R = 2, N = 5
Write to 3 nodes, Read from 2
Guarantees overlap

Characteristics:

• Tunable consistency
• Handles node failures
• Higher latency
• Used in Cassandra, DynamoDB

Consistency Models Explained

Strong Consistency

All reads return the most recent write. Operations appear instantaneous and atomic.

Example: Bank account balance after transfer
Write(balance=1000) → Read() = 1000 (immediately, from any node)

Bounded Staleness

Reads may be stale but staleness is bounded by time or version.

Example: Stock prices with 5-second delay
Write(price=100) → Read() = 99 (old value for up to 5 seconds)

Session Consistency

Within a session, reads reflect previous writes from that session.

Example: Shopping cart updates

Session1: Write(item=shoe) → Read() = [shoe] ✓
Session2: Read() = [] (may not see shoe yet)

Eventual Consistency

System will become consistent over time, assuming no new updates.

Example: Social media likes count
Write(likes++) → Read() = various values → Eventually all nodes = same

Consensus Algorithms

Raft

Understandable consensus algorithm with leader election.

1. Leader election
2. Log replication
3. Safety guarantees
4. Membership changes

Used in: etcd, Consul, CockroachDB

Paxos

Original consensus algorithm, complex but proven.

1. Prepare phase
2. Promise phase
3. Accept phase
4. Accepted notification

Used in: Google Chubby, Apache Zookeeper

Byzantine Fault Tolerance

Handles malicious nodes, not just failures.

• 3f+1 nodes needed
• Tolerates f failures
• Message authentication
• View changes

Used in: Blockchain, PBFT systems

Gossip Protocol

Eventually consistent information dissemination.

• Epidemic spreading
• Anti-entropy
• Rumor mongering
• Convergent

Used in: Cassandra, Riak, Consul

How Tech Giants Scale

Google Spanner

Globally distributed database with strong consistency using atomic clocks.

Scale:

• Millions of machines
• Hundreds of datacenters
• Trillions of rows
• Petabytes of data

Techniques:

• TrueTime API for global ordering
• Paxos for consensus
• 2PL for transactions
• Automatic resharding

Facebook TAO

Graph-based data store for social graph with billions of reads/sec.

Scale:

• 1+ billion users
• Trillions of edges
• Billions of queries/sec
• Sub-millisecond latency

Techniques:

• Write-through cache
• Geo-distributed caching
• Eventual consistency
• Asynchronous replication

Amazon DynamoDB

Fully managed NoSQL with consistent single-digit millisecond latency.

Scale:

• 10 trillion requests/day
• Peaks of 20M requests/sec
• Petabyte-scale tables
• 99.999% availability

Techniques:

• Consistent hashing
• Quorum-based replication
• Adaptive capacity
• Global secondary indexes

Advanced Scaling Pitfalls

Distributed Monolith

Breaking monolith into services that still require synchronous communication. You get all the complexity of microservices with none of the benefits.

Ignoring Network Partitions

Assuming network is reliable. Network failures are inevitable - design for split-brain scenarios and have clear partition handling strategies.

Over-Engineering Consistency

Using strong consistency everywhere when eventual consistency would suffice. Not all data needs ACID guarantees - identify where you can relax consistency.

Cascading Failures

No circuit breakers or bulkheads. One service failure brings down entire system. Implement timeouts, retries with backoff, and circuit breakers.

Advanced Scaling Best Practices

✅ Do

• Design for failure from day one
• Use async communication when possible
• Implement proper observability
• Test chaos engineering scenarios
• Plan for data growth (10x, 100x)
• Use caching at multiple layers
• Implement backpressure mechanisms

❌ Don't

• Ignore the CAP theorem trade-offs
• Use distributed transactions carelessly
• Forget about network latency
• Overlook data consistency requirements
• Skip capacity planning
• Ignore monitoring and alerting
• Assume infinite resources

No quiz questions available

Questions prop is empty

Advanced Scaling Patterns

Beyond Basic Scaling

Distributed System Calculator

CAP Theorem & Trade-offs

What is the CAP Theorem?

Consistency

Availability

Partition Tolerance

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

Database Sharding Strategies

Range-Based Sharding

Hash-Based Sharding

Geographic Sharding

Replication Strategies

Master-Slave Replication

Master-Master Replication

Quorum-Based Replication

Consistency Models Explained

Strong Consistency

Bounded Staleness

Session Consistency

Eventual Consistency

Consensus Algorithms

Raft

Paxos

Byzantine Fault Tolerance

Gossip Protocol

How Tech Giants Scale

Google Spanner

Facebook TAO

Amazon DynamoDB

Advanced Scaling Pitfalls

Distributed Monolith

Ignoring Network Partitions

Over-Engineering Consistency

Cascading Failures

Advanced Scaling Best Practices

✅ Do

❌ Don't