Performance Metrics
Learn to measure what matters. Master the Four Golden Signals of monitoring and understand how top tech companies track system performance and user experience.
The Four Golden Signals (Google SRE)
Google's Site Reliability Engineering team identified four key metrics that provide comprehensive insight into system health. These signals form the foundation of effective monitoring.
Why These Four?
They cover the complete user experience: how fast (latency), how much load (traffic), how often it breaks (errors), and when it might break (saturation). Together, they predict both current user experience and future system stability.
Latency
Time taken to serve a request
Traffic
Amount of demand on your system
Errors
Rate of failed requests
Saturation
How "full" your service is
Real-World Performance Benchmarks
Performance standards from companies serving billions of users. These benchmarks represent world-class user experiences and are targets worth aspiring to.
Key Insight
Notice how these companies set aggressive performance targets. They understand that every millisecond matters for user experience and business outcomes. Speed is a competitive advantage.
Understanding Percentiles
Averages can be misleading. A few slow requests can hide widespread performance issues. Percentiles tell the complete story of user experience.
Why Percentiles Matter
Common Percentiles
P50 (Median)
Represents the typical user experience. Good for general performance tracking.
P95
Primary SLA metric. Balances user experience with engineering practicality.
P99
Catches edge cases and system stress. Critical for high-scale applications.
Performance Metrics Quick Reference
Essential Metrics
- ☐ P95 response time < 200ms
- ☐ Error rate < 0.1%
- ☐ Availability > 99.9%
- ☐ Apdex score > 0.85
- ☐ CPU utilization < 70%
Monitoring Tools
- • Prometheus + Grafana
- • New Relic, Datadog (SaaS)
- • CloudWatch (AWS)
- • Application Performance Monitoring (APM)
- • Synthetic monitoring / uptime checks
🧮 SLA Performance Calculator
Calculate the impact of different performance targets on user experience and costs
Inputs
Result
99.9% availability means 8.76 hours downtime per year
🎯 Performance Metrics in Action
Real-world examples of how performance metrics drive business decisions
Scenarios
Context
Major retailer sees 40% drop in conversions during peak shopping season
Metrics
Outcome
Performance optimization became top priority. CDN implementation and database optimization reduced load times to under 1 second, recovering conversion rates.
Key Lessons
- •Every 100ms delay costs 1% in conversions for e-commerce
- •Peak traffic periods expose hidden performance bottlenecks
- •Error rate above 1% indicates system stress requiring immediate attention
- •Performance monitoring should trigger automatic alerts before user impact