System Designer

Core Definitions

Latency

Time to complete a single operation

• How long does one request take?
• Measured in time units (ms, seconds)
• User experience focus
• Example: "Page loads in 200ms"

Throughput

Number of operations per unit time

• How many requests can you handle?
• Measured in operations/time (RPS, QPS)
• System capacity focus
• Example: "Handles 10,000 requests/sec"

Analogy: Think of a highway. Latency is how long it takes one car to travel from A to B. Throughput is how many cars can use the highway per hour.

The Relationship

High Latency System

1000 (ms vs K ops/s)latency

10 (ms vs K ops/s)throughput

Low Latency System

1 (ms vs K ops/s)latency

100000 (ms vs K ops/s)throughput

Independent Metrics

• High latency ≠ Low throughput
• Low latency ≠ High throughput
• You can optimize each separately
• Sometimes they compete with each other

When They're Related

• Throughput = 1 / Latency (serial processing)
• Concurrency breaks this relationship
• Parallel processing increases throughput
• Batching trades latency for throughput

Real-world Examples

Database Query

Latency: 10ms

Throughput: 1000 queries/sec

Single query is fast, but database can handle many concurrent queries

File Upload

Latency: 2000ms

Throughput: 50 uploads/sec

Each upload takes time, but server can process multiple uploads in parallel

Cache Lookup

Latency: 1ms

Throughput: 100K ops/sec

Very fast individual operations, extremely high throughput

ML Model Inference

Latency: 500ms

Throughput: 20 predictions/sec

Complex computation takes time, limited concurrent processing

Optimization Strategies

Optimizing Latency

•Caching frequently accessed data
•Database indexing and query optimization
•CDN for static content delivery
•Reducing network round trips
•Algorithmic improvements
•Hardware upgrades (SSD, more RAM)

Optimizing Throughput

•Horizontal scaling (more servers)
•Load balancing across instances
•Asynchronous processing
•Connection pooling
•Batch processing operations
•Optimizing resource utilization

Common Trade-offs

Batch Processing vs Real-time

Latency: High latency (minutes to hours)

Throughput: Very high throughput

Use Case: Data analytics, ETL jobs, reporting

Trade-off: Accept high latency for maximum throughput efficiency

Caching vs Fresh Data

Latency: Low latency (milliseconds)

Throughput: High throughput

Use Case: Web applications, API responses

Trade-off: Accept potentially stale data for speed

Compression vs Processing

Latency: Higher latency (compression overhead)

Throughput: Higher throughput (less bandwidth)

Use Case: Large file transfers, network optimization

Trade-off: Accept CPU cost for network efficiency

Measuring Performance

Latency Metrics

P50 (Median)

Half of requests are faster than this

50%

P95

95% of requests are faster than this

95%

P99

Only 1% of requests are slower

99%

P99.9

Captures worst-case scenarios

99.9%

Throughput Metrics

• RPS: Requests per second
• QPS: Queries per second
• TPS: Transactions per second
• Mbps: Megabits per second
• IOPS: I/O operations per second

Pro tip: Always measure both metrics. A system with great P50 latency but terrible P99 will have poor user experience for some users.

System Design Decisions

Latency-Critical Systems

Examples: Trading systems, gaming, real-time chat, video calls

• Optimize for speed of individual operations
• Use caching aggressively
• Minimize network hops
• Keep data close to computation

Throughput-Critical Systems

Examples: Batch processing, data pipelines, web crawlers, analytics

• Optimize for maximum concurrent operations
• Use batching and queuing
• Horizontal scaling
• Parallel processing

Balanced Systems

Examples: Web applications, APIs, e-commerce, social media

• Need both reasonable latency and good throughput
• Use tiered optimization strategies
• Monitor both metrics closely
• Make trade-offs based on user impact

🧮 Performance Calculator

Compare latency vs throughput trade-offs for different system configurations

Inputs

Current Users

users

Target Users

users

Current Server Cost

$/month

Scaling Comparison

Vertical Scaling

15811 $/month

Vertical scaling becomes 31.6x more expensive due to premium hardware costs

Scale Factor:10x

Cost Multiplier:31.6x

Total Servers:1 (bigger)

Horizontal Scaling

5000 $/month

Horizontal scaling requires 10 servers but costs scale linearly

Scale Factor:10x

Number of Servers:10

Cost per Server:$500

💡 Quick Comparison: Horizontal scaling is 3.2x cheaper for this scenario

🎯 Latency vs Throughput Trade-off Scenarios

Learn from real-world decisions about optimizing for latency versus throughput

Metrics

Average Latency

50 microseconds

Peak Throughput

10,000 trades/sec

Hardware Cost

$2M annually

Revenue Impact

$50M+ advantage

Outcome

Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.

Lessons Learned

Co-location with exchanges reduces network latency
Custom hardware (FPGAs) eliminates software overhead
Dedicated network infrastructure bypasses internet routing
Latency optimization can justify extreme costs in high-value scenarios

ScenariosClick to explore

Real-time Trading System

High-frequency trading platform prioritizing ultra-low latency

Video Streaming CDN

Netflix-style platform optimizing for throughput while maintaining quality

Social Media Feed Generation

Facebook-style news feed balancing real-time updates with system capacity

E-commerce Search Engine

Amazon-style product search optimizing for both speed and relevance

Gaming Backend Services

Multiplayer game server balancing real-time gameplay with massive player counts

Batch Analytics Pipeline

Data warehouse processing choosing throughput over real-time insights

Context

High-frequency trading platform prioritizing ultra-low latency

Metrics

Average Latency

50 microseconds

Peak Throughput

10,000 trades/sec

Hardware Cost

$2M annually

Revenue Impact

$50M+ advantage

Outcome

Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.

Key Lessons

•Co-location with exchanges reduces network latency
•Custom hardware (FPGAs) eliminates software overhead
•Dedicated network infrastructure bypasses internet routing
•Latency optimization can justify extreme costs in high-value scenarios

1. Real-time Trading System

Context

High-frequency trading platform prioritizing ultra-low latency

Metrics

Average Latency

50 microseconds

Peak Throughput

10,000 trades/sec

Hardware Cost

$2M annually

Revenue Impact

$50M+ advantage

Outcome

Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.

Key Lessons

•Co-location with exchanges reduces network latency
•Custom hardware (FPGAs) eliminates software overhead
•Dedicated network infrastructure bypasses internet routing
•Latency optimization can justify extreme costs in high-value scenarios

2. Video Streaming CDN

Context

Netflix-style platform optimizing for throughput while maintaining quality

Metrics

Content Delivery Latency

2-5 seconds

Peak Throughput

100 Tbps globally

Bandwidth Efficiency

90% cache hit rate

User Experience

99% no-buffer rate

Outcome

Accepted higher initial latency for massive throughput gains, enabling global scale while reducing costs.

Key Lessons

•Pre-caching popular content optimizes for throughput over latency
•Adaptive bitrate streaming balances quality and throughput
•Edge caching reduces latency for subsequent requests
•Batch processing of analytics trades real-time insights for efficiency

3. Social Media Feed Generation

Context

Facebook-style news feed balancing real-time updates with system capacity

Metrics

Feed Generation

200ms average

Peak Load

1M feed requests/sec

Real-time Updates

5-30 second delay

Personalization

1000+ signals processed

Outcome

Hybrid approach: real-time for critical updates, eventual consistency for feed ranking to balance user experience with scale.

Key Lessons

•Pre-computed feeds improve latency for common access patterns
•Real-time updates for high-priority content (messages, comments)
•Eventual consistency acceptable for non-critical content
•Caching strategies reduce database load while maintaining freshness

4. E-commerce Search Engine

Context

Amazon-style product search optimizing for both speed and relevance

Metrics

Search Latency

100-200ms

Query Throughput

500K searches/sec

Index Update Delay

5-15 minutes

Conversion Impact

+15% from fast search

Outcome

Optimized for sub-200ms search latency while handling massive query volume, accepting delayed inventory updates.

Key Lessons

•Search index optimization critical for both latency and throughput
•Autocomplete and suggestions reduce perceived latency
•Staged search results show fast categories first, detailed results after
•Inventory updates batched every few minutes vs real-time for performance

5. Gaming Backend Services

Context

Multiplayer game server balancing real-time gameplay with massive player counts

Metrics

Game State Latency

16-50ms

Concurrent Players

100K per region

Background Processing

5-30 second delay

Regional Servers

12 global locations

Outcome

Critical game actions prioritized for ultra-low latency, while analytics and social features use eventual consistency.

Key Lessons

•Game state updates require strict latency bounds for fair play
•Regional server deployment reduces network latency
•Non-critical features (stats, achievements) can accept higher latency
•UDP protocols reduce latency vs TCP for real-time data

6. Batch Analytics Pipeline

Context

Data warehouse processing choosing throughput over real-time insights

Metrics

Processing Latency

2-6 hours

Data Throughput

100 TB/hour

Cost Efficiency

90% cheaper than real-time

Resource Utilization

95% cluster efficiency

Outcome

Deliberately chose high latency for massive cost savings and throughput, meeting business needs for daily/hourly reports.

Key Lessons

•Batch processing maximizes resource utilization and reduces costs
•Time-based partitioning enables efficient large-scale processing
•Many business decisions don't require real-time data
•Reserved compute capacity much cheaper than on-demand scaling

No quiz questions available

Quiz ID "latency-vs-throughput" not found