Core Definitions
Latency
Time to complete a single operation
- • How long does one request take?
- • Measured in time units (ms, seconds)
- • User experience focus
- • Example: "Page loads in 200ms"
Throughput
Number of operations per unit time
- • How many requests can you handle?
- • Measured in operations/time (RPS, QPS)
- • System capacity focus
- • Example: "Handles 10,000 requests/sec"
Analogy: Think of a highway. Latency is how long it takes one car to travel from A to B. Throughput is how many cars can use the highway per hour.
The Relationship
Independent Metrics
- • High latency ≠ Low throughput
- • Low latency ≠ High throughput
- • You can optimize each separately
- • Sometimes they compete with each other
When They're Related
- • Throughput = 1 / Latency (serial processing)
- • Concurrency breaks this relationship
- • Parallel processing increases throughput
- • Batching trades latency for throughput
Real-world Examples
Single query is fast, but database can handle many concurrent queries
Each upload takes time, but server can process multiple uploads in parallel
Very fast individual operations, extremely high throughput
Complex computation takes time, limited concurrent processing
Optimization Strategies
Optimizing Latency
- •Caching frequently accessed data
- •Database indexing and query optimization
- •CDN for static content delivery
- •Reducing network round trips
- •Algorithmic improvements
- •Hardware upgrades (SSD, more RAM)
Optimizing Throughput
- •Horizontal scaling (more servers)
- •Load balancing across instances
- •Asynchronous processing
- •Connection pooling
- •Batch processing operations
- •Optimizing resource utilization
Common Trade-offs
Batch Processing vs Real-time
Trade-off: Accept high latency for maximum throughput efficiency
Caching vs Fresh Data
Trade-off: Accept potentially stale data for speed
Compression vs Processing
Trade-off: Accept CPU cost for network efficiency
Measuring Performance
Latency Metrics
Throughput Metrics
- • RPS: Requests per second
- • QPS: Queries per second
- • TPS: Transactions per second
- • Mbps: Megabits per second
- • IOPS: I/O operations per second
Pro tip: Always measure both metrics. A system with great P50 latency but terrible P99 will have poor user experience for some users.
System Design Decisions
Latency-Critical Systems
Examples: Trading systems, gaming, real-time chat, video calls
- • Optimize for speed of individual operations
- • Use caching aggressively
- • Minimize network hops
- • Keep data close to computation
Throughput-Critical Systems
Examples: Batch processing, data pipelines, web crawlers, analytics
- • Optimize for maximum concurrent operations
- • Use batching and queuing
- • Horizontal scaling
- • Parallel processing
Balanced Systems
Examples: Web applications, APIs, e-commerce, social media
- • Need both reasonable latency and good throughput
- • Use tiered optimization strategies
- • Monitor both metrics closely
- • Make trade-offs based on user impact
🧮 Performance Calculator
Compare latency vs throughput trade-offs for different system configurations
Inputs
Scaling Comparison
Vertical Scaling
Vertical scaling becomes 31.6x more expensive due to premium hardware costs
Horizontal Scaling
Horizontal scaling requires 10 servers but costs scale linearly
🎯 Latency vs Throughput Trade-off Scenarios
Learn from real-world decisions about optimizing for latency versus throughput
Scenarios
Context
High-frequency trading platform prioritizing ultra-low latency
Metrics
Outcome
Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.
Key Lessons
- •Co-location with exchanges reduces network latency
- •Custom hardware (FPGAs) eliminates software overhead
- •Dedicated network infrastructure bypasses internet routing
- •Latency optimization can justify extreme costs in high-value scenarios