Latency vs Throughput
Core Definitions
Latency
Time to complete a single operation
- • How long does one request take?
- • Measured in time units (ms, seconds)
- • User experience focus
- • Example: "Page loads in 200ms"
Throughput
Number of operations per unit time
- • How many requests can you handle?
- • Measured in operations/time (RPS, QPS)
- • System capacity focus
- • Example: "Handles 10,000 requests/sec"
Analogy: Think of a highway. Latency is how long it takes one car to travel from A to B. Throughput is how many cars can use the highway per hour.
The Relationship
Independent Metrics
- • High latency ≠ Low throughput
- • Low latency ≠ High throughput
- • You can optimize each separately
- • Sometimes they compete with each other
When They're Related
- • Throughput = 1 / Latency (serial processing)
- • Concurrency breaks this relationship
- • Parallel processing increases throughput
- • Batching trades latency for throughput
Real-world Examples
Single query is fast, but database can handle many concurrent queries
Each upload takes time, but server can process multiple uploads in parallel
Very fast individual operations, extremely high throughput
Complex computation takes time, limited concurrent processing
Optimization Strategies
Optimizing Latency
- •Caching frequently accessed data
- •Database indexing and query optimization
- •CDN for static content delivery
- •Reducing network round trips
- •Algorithmic improvements
- •Hardware upgrades (SSD, more RAM)
Optimizing Throughput
- •Horizontal scaling (more servers)
- •Load balancing across instances
- •Asynchronous processing
- •Connection pooling
- •Batch processing operations
- •Optimizing resource utilization
Common Trade-offs
Batch Processing vs Real-time
Trade-off: Accept high latency for maximum throughput efficiency
Caching vs Fresh Data
Trade-off: Accept potentially stale data for speed
Compression vs Processing
Trade-off: Accept CPU cost for network efficiency
Measuring Performance
Latency Metrics
Throughput Metrics
- • RPS: Requests per second
- • QPS: Queries per second
- • TPS: Transactions per second
- • Mbps: Megabits per second
- • IOPS: I/O operations per second
Pro tip: Always measure both metrics. A system with great P50 latency but terrible P99 will have poor user experience for some users.
System Design Decisions
Latency-Critical Systems
Examples: Trading systems, gaming, real-time chat, video calls
- • Optimize for speed of individual operations
- • Use caching aggressively
- • Minimize network hops
- • Keep data close to computation
Throughput-Critical Systems
Examples: Batch processing, data pipelines, web crawlers, analytics
- • Optimize for maximum concurrent operations
- • Use batching and queuing
- • Horizontal scaling
- • Parallel processing
Balanced Systems
Examples: Web applications, APIs, e-commerce, social media
- • Need both reasonable latency and good throughput
- • Use tiered optimization strategies
- • Monitor both metrics closely
- • Make trade-offs based on user impact
🧮 Performance Calculator
Compare latency vs throughput trade-offs for different system configurations
Inputs
Scaling Comparison
Vertical Scaling
Vertical scaling becomes 31.6x more expensive due to premium hardware costs
Horizontal Scaling
Horizontal scaling requires 10 servers but costs scale linearly
🎯 Latency vs Throughput Trade-off Scenarios
Learn from real-world decisions about optimizing for latency versus throughput
Metrics
Outcome
Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.
Lessons Learned
- Co-location with exchanges reduces network latency
- Custom hardware (FPGAs) eliminates software overhead
- Dedicated network infrastructure bypasses internet routing
- Latency optimization can justify extreme costs in high-value scenarios
ScenariosClick to explore
Context
High-frequency trading platform prioritizing ultra-low latency
Metrics
Outcome
Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.
Key Lessons
- •Co-location with exchanges reduces network latency
- •Custom hardware (FPGAs) eliminates software overhead
- •Dedicated network infrastructure bypasses internet routing
- •Latency optimization can justify extreme costs in high-value scenarios
1. Real-time Trading System
Context
High-frequency trading platform prioritizing ultra-low latency
Metrics
Outcome
Massive investment in low-latency infrastructure pays off through competitive advantage in millisecond-sensitive trading.
Key Lessons
- •Co-location with exchanges reduces network latency
- •Custom hardware (FPGAs) eliminates software overhead
- •Dedicated network infrastructure bypasses internet routing
- •Latency optimization can justify extreme costs in high-value scenarios
2. Video Streaming CDN
Context
Netflix-style platform optimizing for throughput while maintaining quality
Metrics
Outcome
Accepted higher initial latency for massive throughput gains, enabling global scale while reducing costs.
Key Lessons
- •Pre-caching popular content optimizes for throughput over latency
- •Adaptive bitrate streaming balances quality and throughput
- •Edge caching reduces latency for subsequent requests
- •Batch processing of analytics trades real-time insights for efficiency
3. Social Media Feed Generation
Context
Facebook-style news feed balancing real-time updates with system capacity
Metrics
Outcome
Hybrid approach: real-time for critical updates, eventual consistency for feed ranking to balance user experience with scale.
Key Lessons
- •Pre-computed feeds improve latency for common access patterns
- •Real-time updates for high-priority content (messages, comments)
- •Eventual consistency acceptable for non-critical content
- •Caching strategies reduce database load while maintaining freshness
4. E-commerce Search Engine
Context
Amazon-style product search optimizing for both speed and relevance
Metrics
Outcome
Optimized for sub-200ms search latency while handling massive query volume, accepting delayed inventory updates.
Key Lessons
- •Search index optimization critical for both latency and throughput
- •Autocomplete and suggestions reduce perceived latency
- •Staged search results show fast categories first, detailed results after
- •Inventory updates batched every few minutes vs real-time for performance
5. Gaming Backend Services
Context
Multiplayer game server balancing real-time gameplay with massive player counts
Metrics
Outcome
Critical game actions prioritized for ultra-low latency, while analytics and social features use eventual consistency.
Key Lessons
- •Game state updates require strict latency bounds for fair play
- •Regional server deployment reduces network latency
- •Non-critical features (stats, achievements) can accept higher latency
- •UDP protocols reduce latency vs TCP for real-time data
6. Batch Analytics Pipeline
Context
Data warehouse processing choosing throughput over real-time insights
Metrics
Outcome
Deliberately chose high latency for massive cost savings and throughput, meeting business needs for daily/hourly reports.
Key Lessons
- •Batch processing maximizes resource utilization and reduces costs
- •Time-based partitioning enables efficient large-scale processing
- •Many business decisions don't require real-time data
- •Reserved compute capacity much cheaper than on-demand scaling