Latency vs Throughput

15 min readBeginner
Not Started
Loading...

Core Definitions

Latency

Time to complete a single operation

  • • How long does one request take?
  • • Measured in time units (ms, seconds)
  • • User experience focus
  • • Example: "Page loads in 200ms"

Throughput

Number of operations per unit time

  • • How many requests can you handle?
  • • Measured in operations/time (RPS, QPS)
  • • System capacity focus
  • • Example: "Handles 10,000 requests/sec"

Analogy: Think of a highway. Latency is how long it takes one car to travel from A to B. Throughput is how many cars can use the highway per hour.

The Relationship

High Latency System
1000 (ms vs K ops/s)latency
10 (ms vs K ops/s)throughput
Low Latency System
1 (ms vs K ops/s)latency
100000 (ms vs K ops/s)throughput

Independent Metrics

  • • High latency ≠ Low throughput
  • • Low latency ≠ High throughput
  • • You can optimize each separately
  • • Sometimes they compete with each other

When They're Related

  • • Throughput = 1 / Latency (serial processing)
  • • Concurrency breaks this relationship
  • • Parallel processing increases throughput
  • • Batching trades latency for throughput

Real-world Examples

Database Query
Latency: 10ms
Throughput: 1000 queries/sec

Single query is fast, but database can handle many concurrent queries

File Upload
Latency: 2000ms
Throughput: 50 uploads/sec

Each upload takes time, but server can process multiple uploads in parallel

Cache Lookup
Latency: 1ms
Throughput: 100K ops/sec

Very fast individual operations, extremely high throughput

ML Model Inference
Latency: 500ms
Throughput: 20 predictions/sec

Complex computation takes time, limited concurrent processing

Optimization Strategies

Optimizing Latency

  • Caching frequently accessed data
  • Database indexing and query optimization
  • CDN for static content delivery
  • Reducing network round trips
  • Algorithmic improvements
  • Hardware upgrades (SSD, more RAM)

Optimizing Throughput

  • Horizontal scaling (more servers)
  • Load balancing across instances
  • Asynchronous processing
  • Connection pooling
  • Batch processing operations
  • Optimizing resource utilization

Common Trade-offs

Batch Processing vs Real-time

Latency: High latency (minutes to hours)
Throughput: Very high throughput
Use Case: Data analytics, ETL jobs, reporting

Trade-off: Accept high latency for maximum throughput efficiency

Caching vs Fresh Data

Latency: Low latency (milliseconds)
Throughput: High throughput
Use Case: Web applications, API responses

Trade-off: Accept potentially stale data for speed

Compression vs Processing

Latency: Higher latency (compression overhead)
Throughput: Higher throughput (less bandwidth)
Use Case: Large file transfers, network optimization

Trade-off: Accept CPU cost for network efficiency

Measuring Performance

Latency Metrics

P50 (Median)
Half of requests are faster than this
50%
P95
95% of requests are faster than this
95%
P99
Only 1% of requests are slower
99%
P99.9
Captures worst-case scenarios
99.9%

Throughput Metrics

  • RPS: Requests per second
  • QPS: Queries per second
  • TPS: Transactions per second
  • Mbps: Megabits per second
  • IOPS: I/O operations per second

Pro tip: Always measure both metrics. A system with great P50 latency but terrible P99 will have poor user experience for some users.

System Design Decisions

Latency-Critical Systems

Examples: Trading systems, gaming, real-time chat, video calls

  • • Optimize for speed of individual operations
  • • Use caching aggressively
  • • Minimize network hops
  • • Keep data close to computation

Throughput-Critical Systems

Examples: Batch processing, data pipelines, web crawlers, analytics

  • • Optimize for maximum concurrent operations
  • • Use batching and queuing
  • • Horizontal scaling
  • • Parallel processing

Balanced Systems

Examples: Web applications, APIs, e-commerce, social media

  • • Need both reasonable latency and good throughput
  • • Use tiered optimization strategies
  • • Monitor both metrics closely
  • • Make trade-offs based on user impact

🧮 Performance Calculator

Compare latency vs throughput trade-offs for different system configurations

Inputs

users
users
$/month

Scaling Comparison

Vertical Scaling
15811 $/month

Vertical scaling becomes 31.6x more expensive due to premium hardware costs

Scale Factor:10x
Cost Multiplier:31.6x
Total Servers:1 (bigger)
Horizontal Scaling
5000 $/month

Horizontal scaling requires 10 servers but costs scale linearly

Scale Factor:10x
Number of Servers:10
Cost per Server:$500
💡 Quick Comparison: Horizontal scaling is 3.2x cheaper for this scenario

🎯 Latency vs Throughput Trade-off Scenarios

Learn from real-world decisions about optimizing for latency versus throughput

Scenarios

Real-time Trading System
High-frequency trading platform prioritizing ultra-low latency
Video Streaming CDN
Netflix-style platform optimizing for throughput while maintaining quality
Social Media Feed Generation
Facebook-style news feed balancing real-time updates with system capacity
E-commerce Search Engine
Amazon-style product search optimizing for both speed and relevance
Gaming Backend Services
Multiplayer game server balancing real-time gameplay with massive player counts
Batch Analytics Pipeline
Data warehouse processing choosing throughput over real-time insights

Context

High-frequency trading platform prioritizing ultra-low latency

Metrics

Average Latency
50 microseconds
Peak Throughput
10,000 trades/sec
Hardware Cost
$2M annually
Revenue Impact
$50M+ advantage

Outcome

Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.

Key Lessons

  • Co-location with exchanges reduces network latency
  • Custom hardware (FPGAs) eliminates software overhead
  • Dedicated network infrastructure bypasses internet routing
  • Latency optimization can justify extreme costs in high-value scenarios

📝 Latency vs Throughput Quiz

1 of 5Current: 0/5

A system can process 1 request in 100ms. If processed serially, what is the maximum throughput?