Rate Limiting
Master traffic control patterns that protect APIs and services from overload while maintaining fair resource allocation
25 min readβ’
Not Started
π¦ Rate Limiting Impact Calculator
π Traffic Analysis
Allowed: 502 RPS
Rejected: 498 RPS
Acceptance Rate: 50%
Rejection Rate: 50%
Burst Handled: 100 RPS
β‘ Performance Impact
Latency Overhead: 0.1 ms
CPU Overhead: 10%
Memory Usage: 10 MB
Storage Needed: 3 KB
Sync Latency: 1 ms
πΌ Business Impact
Revenue Impact: -$4.98/sec
Consistency Level: 95%
Service Tier: standard
π¨ High Rejection Rate
Consider increasing rate limits or implementing queue-based throttling
Rate Limiting Fundamentals
Rate Limiting is a technique to control the rate of requests sent or received by a network interface controller. It's used to prevent abuse, ensure fair resource allocation, and maintain service quality.
π― Primary Goals
- Prevent API abuse and DoS attacks
- Ensure fair resource allocation among users
- Maintain service stability under load
- Protect downstream services
- Monetize API usage through tiered limits
- Comply with third-party API limits
ποΈ Key Components
- Identifier: User, IP, API key, or service
- Limit: Maximum requests per time window
- Window: Time period for counting requests
- Counter: Current request count
- Action: Allow, deny, or throttle
- Response: Headers and error messages
Rate Limiting Algorithms
πͺ£ Token Bucket
Tokens are added to a bucket at a fixed rate. Each request consumes a token.
How it works:
- Bucket holds up to N tokens
- Tokens added at rate R per second
- Each request consumes 1 token
- Request denied if no tokens available
Advantages:
- Allows burst traffic
- Simple to implement
- Memory efficient
Use cases:
API gateways, CDNs, traffic shaping
π§ Leaky Bucket
Requests are processed at a fixed rate, overflow requests are dropped.
How it works:
- Requests enter bucket (queue)
- Processed at fixed rate
- Bucket has maximum capacity
- Overflow requests are dropped
Advantages:
- Smooth output rate
- No burst traffic
- Predictable behavior
Use cases:
Network QoS, streaming services, rate smoothing
πͺ Fixed Window
Count requests in fixed time windows (e.g., per minute).
How it works:
- Define time window (e.g., 1 minute)
- Count requests in current window
- Reset counter at window boundary
- Deny if counter exceeds limit
Problem:
Boundary effect - users can send 2x limit around window boundaries
Use cases:
Simple APIs, basic protection, logging systems
π’ Sliding Window
Continuously track requests over a sliding time window.
How it works:
- Maintain request timestamps
- Remove old requests outside window
- Count remaining requests
- Deny if count exceeds limit
Advantages:
- Accurate rate limiting
- No boundary effects
- Smooth operation
Disadvantages:
Higher memory usage, more complex implementation
Algorithm Comparison
Algorithm | Burst Traffic | Memory Usage | Accuracy | Implementation | Best For |
---|---|---|---|---|---|
Token Bucket | β Allowed | Low | Good | Simple | API gateways |
Leaky Bucket | β Not allowed | Medium | High | Medium | QoS systems |
Fixed Window | β οΈ Boundary issues | Very Low | Poor | Very Simple | Basic protection |
Sliding Window | β οΈ Limited | High | Excellent | Complex | Precise control |
Implementation Examples
πͺ£ Token Bucket (Redis)
Token Bucket Rate Limiter (Redis Lua Script)
-- Lua script for Redis
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokens = tonumber(ARGV[2])
local interval = tonumber(ARGV[3])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or 0
local now = redis.call('TIME')[1]
local elapsed = now - last_refill
current_tokens = math.min(capacity,
current_tokens + elapsed * tokens / interval)
if current_tokens >= 1 then
current_tokens = current_tokens - 1
redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', now)
redis.call('EXPIRE', key, interval * 2)
return 1
else
return 0
end
πͺ Sliding Window (Python)
Sliding Window Rate Limiter Implementation
import time
from collections import deque
class SlidingWindowRateLimiter:
def __init__(self, limit, window_size):
self.limit = limit
self.window_size = window_size
self.requests = deque()
def allow_request(self):
now = time.time()
# Remove old requests
while (self.requests and
self.requests[0] <= now - self.window_size):
self.requests.popleft()
# Check if limit exceeded
if len(self.requests) < self.limit:
self.requests.append(now)
return True
return False
Distributed Rate Limiting
β οΈ Challenges
Race Conditions:
Multiple instances updating counters simultaneously
Synchronization Latency:
Network delays in distributed counter updates
Partial Failures:
Some nodes may have stale counter information
Hot Partitioning:
Popular users creating hotspots in storage
β Solutions
Centralized Storage:
Redis/Hazelcast for shared state
Sticky Sessions:
Route users to same instance
Approximate Counters:
Accept some inaccuracy for performance
Hierarchical Limits:
Global + local rate limits
ποΈ Distributed Architecture
Load Balancer
Route to instances
β
API Instance 1
Local counters
API Instance 2
Local counters
API Instance N
Local counters
β
Centralized Store (Redis)
Shared rate limit state
Rate Limiting Strategies
π€ User-Based
Identification:
- API keys
- User authentication tokens
- OAuth client IDs
- Account identifiers
Advantages:
- Fair allocation per user
- Supports tiered pricing
- Prevents single user abuse
Use Cases:
SaaS APIs, freemium models
π IP-Based
Identification:
- Source IP addresses
- IP ranges/subnets
- Geo-location based
- ASN-based limits
Advantages:
- Simple implementation
- Works for anonymous users
- DDoS protection
Challenges:
NAT, CDNs, shared IPs
π§ Resource-Based
Granularity:
- Per endpoint/method
- Per operation type
- Per resource category
- Compute cost-based
Advantages:
- Protects expensive operations
- Fine-grained control
- Resource-aware limiting
Examples:
Search: 10 RPS, Upload: 1 RPS
Rate Limit Response Strategies
π« Hard Limits
Behavior:
Immediately reject requests that exceed limits
HTTP Response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640995200
Retry-After: 60
Use Cases:
- Protection against abuse
- Resource conservation
- Cost control
π Throttling
Behavior:
Slow down requests instead of rejecting them
Techniques:
- Add artificial delays
- Queue requests
- Reduce response size
- Lower quality results
Use Cases:
- Better user experience
- Gradual degradation
- Revenue preservation
Monitoring Rate Limiting
π Key Metrics
- β’ Rate limit hit rate by endpoint
- β’ Top rate-limited users/IPs
- β’ Request acceptance vs rejection ratio
- β’ Average rate limit utilization
- β’ Peak usage patterns
- β’ Geographic distribution of limits hit
- β’ Cost per rate-limited request
π¨ Alerts
- β’ Spike in rate limit violations
- β’ Unusual traffic patterns
- β’ Rate limiting system failure
- β’ High rejection rates impacting revenue
- β’ Configuration changes
- β’ Storage system latency spikes
- β’ DDoS attack patterns
π Analytics
- β’ User behavior analysis
- β’ Optimal limit recommendations
- β’ Revenue impact of rate limiting
- β’ Conversion rates by rate limit tier
- β’ Seasonal traffic patterns
- β’ A/B testing rate limit values
- β’ Churn correlation with limits
Rate Limiting Best Practices
β Do's
- β’ Use meaningful HTTP headers (X-RateLimit-*)
- β’ Implement graceful degradation
- β’ Provide clear error messages
- β’ Allow burst traffic with token bucket
- β’ Monitor rate limit effectiveness
- β’ Use different limits for different operations
- β’ Implement retry logic with exponential backoff
- β’ Consider user experience in limit design
β Don'ts
- β’ Don't set limits too aggressively
- β’ Don't ignore legitimate high-usage scenarios
- β’ Don't implement rate limiting without monitoring
- β’ Don't use the same limits for all users
- β’ Don't forget to handle distributed synchronization
- β’ Don't ignore the performance impact
- β’ Don't make error messages too cryptic
- β’ Don't rate limit health checks
Real-World Rate Limiting
π¦ Twitter API
Strategy: User + endpoint-based limits with time windows
Limits:
- Tweet posting: 300 per 15 minutes
- Following: 400 per 24 hours
- Direct messages: 1000 per 24 hours
Implementation: Token bucket with user-specific buckets
πΊοΈ Google Maps API
Strategy: Quota-based with different limits per API
Limits:
- Geocoding: 50 RPS
- Directions: 50 RPS
- Places: 100 RPS
Features: Burst allowance, per-user quotas, billing integration
π§ SendGrid API
Strategy: Tiered limits based on subscription plan
Limits:
- Free: 100 emails/day
- Essentials: 40,000 emails/day
- Pro: 120,000 emails/day
Features: Burst handling, detailed usage analytics
π Shopify API
Strategy: Leaky bucket with call credit system
Limits:
- REST API: 2 RPS (40 call credit bucket)
- GraphQL: Query complexity-based limits
Features: Call credit refill, complexity scoring
π Rate Limiting Knowledge Quiz
1 of 5Current: 0/5