Scalability Basics

15 min readBeginner
Not Started
Loading...

What is Scalability?

Scalability is the ability of a system to handle increased load by adding resources to the system.

Key insight: A scalable system maintains performance as load increases. If doubling users doubles response time, your system isn't scalable.

Vertical vs Horizontal Scaling

Max Capacity
100 unitsvertical
1000 unitshorizontal
Initial Cost
100 $vertical
300 $horizontal
Complexity
20/100vertical
80/100horizontal

Vertical Scaling (Scale Up)

Cost: $$$
Complexity: Low

Add more power to your existing machine

Limit: Hardware limits
Example: Upgrade from 4GB to 32GB RAM

Horizontal Scaling (Scale Out)

Cost: $$
Complexity: High

Add more machines to your pool of resources

Limit: Nearly unlimited
Example: Add more web servers behind load balancer

Zero to Millions: The Complete Journey

Alex Xu's Framework: This progression shows how systems evolve from a single user to millions, with each stage introducing specific challenges and architectural solutions.

Stage 1: Single Server (1-1,000 users)

Architecture

  • • Web app, database, cache on one server
  • • DNS points to single IP address
  • • Users connect directly to server

What Breaks First

  • • Server CPU/memory limits (500-1K users)
  • • Database connection pool exhaustion
  • • Single point of failure
Stage 2: Database Separation (1K-10K users)

Architecture Changes

  • • Separate web server and database server
  • • Vertical scaling for both servers
  • • Network connection between tiers

Benefits

  • • Independent scaling of web/DB tiers
  • • Better resource utilization
  • • Supports 5-10x more users
Stage 3: Load Balancer + Cache (10K-100K users)

New Components

  • • Load balancer for multiple web servers
  • • Cache layer (Redis/Memcached)
  • • CDN for static assets
  • • Session store (Redis/Database)

Performance Gains

  • • 80-90% cache hit ratio
  • • 10x faster static asset delivery
  • • Horizontal scaling capability
Stage 4: Database Scaling (100K-1M users)

Database Strategy

  • • Master-slave replication
  • • Read replicas for scaling reads
  • • Write-through cache for consistency
  • • Connection pooling

Key Decisions

  • • Split read/write traffic (90/10 ratio)
  • • Choose SQL vs NoSQL
  • • Plan for eventual consistency
Stage 5: Microservices (1M+ users)

Service Decomposition

  • • Break monolith into services
  • • Service mesh for communication
  • • Message queues for async processing
  • • Database per service

New Challenges

  • • Distributed system complexity
  • • Service discovery and config
  • • Monitoring and logging
  • • Data consistency across services

💡 Key Takeaways

  • Scale incrementally: Don't build for 1M users when you have 1K
  • Database is usually the bottleneck: Plan your data layer carefully
  • Caching wins: 80% of requests can be served from cache
  • Stateless design: Enables horizontal scaling
  • Monitor everything: You can't optimize what you can't measure

Identifying Bottlenecks

Before scaling, you need to identify what's limiting your system's performance.

CPU
Warning: High CPU usage (>80%)
Solution: More cores or better algorithms
Memory
Warning: High RAM usage, swapping
Solution: More RAM or data structure optimization
Storage
Warning: Slow disk I/O, high latency
Solution: SSD, caching, or database optimization
Network
Warning: High bandwidth usage, packet loss
Solution: CDN, compression, or load balancing
Database
Warning: Slow queries, connection limits
Solution: Indexing, read replicas, or sharding

Load Testing & Metrics

Key Metrics to Monitor

  • Response Time: How long requests take
  • Throughput: Requests handled per second
  • Error Rate: Percentage of failed requests
  • Resource Usage: CPU, memory, disk, network
  • Concurrent Users: Active users at same time

Load Testing Types

  • Load Test: Expected normal traffic
  • Stress Test: Beyond normal capacity
  • Spike Test: Sudden traffic increases
  • Endurance Test: Extended periods
  • Volume Test: Large amounts of data

Scaling Timeline

1-1K Users
Monolith app, single database, simple deployment
Single Server
1K-10K Users
Redis/Memcached, CDN for static assets
Add Caching
10K-100K Users
Read replicas, connection pooling
Database Scaling
100K-1M Users
Multiple app servers, session management
Load Balancing
1M+ Users
Service decomposition, message queues
Microservices

Common Scaling Mistakes

Premature Optimization

Building for 1M users when you have 100. Start simple, scale when needed.

Not Measuring

Guessing at bottlenecks instead of using monitoring and profiling tools.

Only Vertical Scaling

Throwing money at bigger servers instead of architecting for horizontal scale.

🧮 Scaling Cost Calculator

Compare the costs of vertical vs horizontal scaling strategies

Inputs

users
users
$/month

Scaling Comparison

Vertical Scaling
3162 $/month

Vertical scaling becomes 31.6x more expensive due to premium hardware costs

Scale Factor:10x
Cost Multiplier:31.6x
Total Servers:1 (bigger)
Horizontal Scaling
1000 $/month

Horizontal scaling requires 10 servers but costs scale linearly

Scale Factor:10x
Number of Servers:10
Cost per Server:$100
💡 Quick Comparison: Horizontal scaling is 3.2x cheaper for this scenario

🎯 Real-world Scaling Scenarios

Explore how different companies handle scaling challenges

Scenarios

E-commerce Flash Sale
Black Friday traffic spike: 10x normal load in 1 hour
Social Media Viral Post
Unexpected viral content: 100x traffic in 30 minutes
Gaming Launch Day
New game release: Sustained high load for 48 hours
Streaming Service Super Bowl
Live sports streaming: 50M concurrent viewers during halftime
Banking System Payment Rush
Tax deadline day: 10x increase in money transfers and payments
News Site Breaking Story
Major world event: Traffic from 10K to 2M users in 20 minutes
Ride-sharing New Year's Eve
Peak demand night: 50x normal ride requests in major cities
Cloud Storage Backup Sunday
Weekly backup day: 100TB+ of data uploaded simultaneously

Context

Black Friday traffic spike: 10x normal load in 1 hour

Metrics

Normal Traffic
1K RPS
Peak Traffic
10K RPS
Response Time
2.5s
Error Rate
15%

Outcome

Without auto-scaling, servers crashed. Lost $2M in sales during 3-hour outage.

Key Lessons

  • Pre-scale infrastructure before known traffic spikes
  • Implement auto-scaling with aggressive scaling policies
  • Use CDN and caching to reduce server load
  • Load test with 10x expected traffic

📝 Scalability Knowledge Check

1 of 3Current: 0/3

Which scaling approach typically has lower complexity but higher cost per unit of capacity?