What is Scalability?
Scalability is the ability of a system to handle increased load by adding resources to the system.
Key insight: A scalable system maintains performance as load increases. If doubling users doubles response time, your system isn't scalable.
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up)
Add more power to your existing machine
Horizontal Scaling (Scale Out)
Add more machines to your pool of resources
Zero to Millions: The Complete Journey
Alex Xu's Framework: This progression shows how systems evolve from a single user to millions, with each stage introducing specific challenges and architectural solutions.
Architecture
- • Web app, database, cache on one server
- • DNS points to single IP address
- • Users connect directly to server
What Breaks First
- • Server CPU/memory limits (500-1K users)
- • Database connection pool exhaustion
- • Single point of failure
Architecture Changes
- • Separate web server and database server
- • Vertical scaling for both servers
- • Network connection between tiers
Benefits
- • Independent scaling of web/DB tiers
- • Better resource utilization
- • Supports 5-10x more users
New Components
- • Load balancer for multiple web servers
- • Cache layer (Redis/Memcached)
- • CDN for static assets
- • Session store (Redis/Database)
Performance Gains
- • 80-90% cache hit ratio
- • 10x faster static asset delivery
- • Horizontal scaling capability
Database Strategy
- • Master-slave replication
- • Read replicas for scaling reads
- • Write-through cache for consistency
- • Connection pooling
Key Decisions
- • Split read/write traffic (90/10 ratio)
- • Choose SQL vs NoSQL
- • Plan for eventual consistency
Service Decomposition
- • Break monolith into services
- • Service mesh for communication
- • Message queues for async processing
- • Database per service
New Challenges
- • Distributed system complexity
- • Service discovery and config
- • Monitoring and logging
- • Data consistency across services
💡 Key Takeaways
- • Scale incrementally: Don't build for 1M users when you have 1K
- • Database is usually the bottleneck: Plan your data layer carefully
- • Caching wins: 80% of requests can be served from cache
- • Stateless design: Enables horizontal scaling
- • Monitor everything: You can't optimize what you can't measure
Real-world Scaling Examples
Netflix
Challenge: 230M+ subscribers, 15,000+ titles, global streaming
Solution: Microservices (700+), AWS auto-scaling, CDN (Netflix Open Connect)
Challenge: 2B+ users, 100B+ messages/day with 55 engineers
Solution: Erlang (massive concurrency), minimal features, FreeBSD optimization
Discord
Challenge: Real-time messaging, voice chat, 150M+ active users
Solution: Elixir clusters, Rust for performance-critical parts, global edge nodes
Identifying Bottlenecks
Before scaling, you need to identify what's limiting your system's performance.
Load Testing & Metrics
Key Metrics to Monitor
- • Response Time: How long requests take
- • Throughput: Requests handled per second
- • Error Rate: Percentage of failed requests
- • Resource Usage: CPU, memory, disk, network
- • Concurrent Users: Active users at same time
Load Testing Types
- • Load Test: Expected normal traffic
- • Stress Test: Beyond normal capacity
- • Spike Test: Sudden traffic increases
- • Endurance Test: Extended periods
- • Volume Test: Large amounts of data
Scaling Timeline
Common Scaling Mistakes
Premature Optimization
Building for 1M users when you have 100. Start simple, scale when needed.
Not Measuring
Guessing at bottlenecks instead of using monitoring and profiling tools.
Only Vertical Scaling
Throwing money at bigger servers instead of architecting for horizontal scale.
🧮 Scaling Cost Calculator
Compare the costs of vertical vs horizontal scaling strategies
Inputs
Scaling Comparison
Vertical Scaling
Vertical scaling becomes 31.6x more expensive due to premium hardware costs
Horizontal Scaling
Horizontal scaling requires 10 servers but costs scale linearly
🎯 Real-world Scaling Scenarios
Explore how different companies handle scaling challenges
Scenarios
Context
Black Friday traffic spike: 10x normal load in 1 hour
Metrics
Outcome
Without auto-scaling, servers crashed. Lost $2M in sales during 3-hour outage.
Key Lessons
- •Pre-scale infrastructure before known traffic spikes
- •Implement auto-scaling with aggressive scaling policies
- •Use CDN and caching to reduce server load
- •Load test with 10x expected traffic