Rate Limiting

Master traffic control patterns that protect APIs and services from overload while maintaining fair resource allocation

25 min readβ€’
Not Started

🚦 Rate Limiting Impact Calculator

πŸ“Š Traffic Analysis

Allowed: 502 RPS
Rejected: 498 RPS
Acceptance Rate: 50%
Rejection Rate: 50%
Burst Handled: 100 RPS

⚑ Performance Impact

Latency Overhead: 0.1 ms
CPU Overhead: 10%
Memory Usage: 10 MB
Storage Needed: 3 KB
Sync Latency: 1 ms

πŸ’Ό Business Impact

Revenue Impact: -$4.98/sec
Consistency Level: 95%
Service Tier: standard

🚨 High Rejection Rate

Consider increasing rate limits or implementing queue-based throttling

Rate Limiting Fundamentals

Rate Limiting is a technique to control the rate of requests sent or received by a network interface controller. It's used to prevent abuse, ensure fair resource allocation, and maintain service quality.

🎯 Primary Goals

  • Prevent API abuse and DoS attacks
  • Ensure fair resource allocation among users
  • Maintain service stability under load
  • Protect downstream services
  • Monetize API usage through tiered limits
  • Comply with third-party API limits

πŸ—οΈ Key Components

  • Identifier: User, IP, API key, or service
  • Limit: Maximum requests per time window
  • Window: Time period for counting requests
  • Counter: Current request count
  • Action: Allow, deny, or throttle
  • Response: Headers and error messages

Rate Limiting Algorithms

πŸͺ£ Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes a token.
How it works:
  • Bucket holds up to N tokens
  • Tokens added at rate R per second
  • Each request consumes 1 token
  • Request denied if no tokens available
Advantages:
  • Allows burst traffic
  • Simple to implement
  • Memory efficient
Use cases:
API gateways, CDNs, traffic shaping

πŸ’§ Leaky Bucket

Requests are processed at a fixed rate, overflow requests are dropped.
How it works:
  • Requests enter bucket (queue)
  • Processed at fixed rate
  • Bucket has maximum capacity
  • Overflow requests are dropped
Advantages:
  • Smooth output rate
  • No burst traffic
  • Predictable behavior
Use cases:
Network QoS, streaming services, rate smoothing

πŸͺŸ Fixed Window

Count requests in fixed time windows (e.g., per minute).
How it works:
  • Define time window (e.g., 1 minute)
  • Count requests in current window
  • Reset counter at window boundary
  • Deny if counter exceeds limit
Problem:
Boundary effect - users can send 2x limit around window boundaries
Use cases:
Simple APIs, basic protection, logging systems

🎒 Sliding Window

Continuously track requests over a sliding time window.
How it works:
  • Maintain request timestamps
  • Remove old requests outside window
  • Count remaining requests
  • Deny if count exceeds limit
Advantages:
  • Accurate rate limiting
  • No boundary effects
  • Smooth operation
Disadvantages:
Higher memory usage, more complex implementation

Algorithm Comparison

AlgorithmBurst TrafficMemory UsageAccuracyImplementationBest For
Token Bucketβœ… AllowedLowGoodSimpleAPI gateways
Leaky Bucket❌ Not allowedMediumHighMediumQoS systems
Fixed Window⚠️ Boundary issuesVery LowPoorVery SimpleBasic protection
Sliding Window⚠️ LimitedHighExcellentComplexPrecise control

Implementation Examples

πŸͺ£ Token Bucket (Redis)

Token Bucket Rate Limiter (Redis Lua Script)
-- Lua script for Redis
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokens = tonumber(ARGV[2])
local interval = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or 0

local now = redis.call('TIME')[1]
local elapsed = now - last_refill
current_tokens = math.min(capacity, 
    current_tokens + elapsed * tokens / interval)

if current_tokens >= 1 then
  current_tokens = current_tokens - 1
  redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', now)
  redis.call('EXPIRE', key, interval * 2)
  return 1
else
  return 0
end

πŸͺŸ Sliding Window (Python)

Sliding Window Rate Limiter Implementation
import time
from collections import deque

class SlidingWindowRateLimiter:
    def __init__(self, limit, window_size):
        self.limit = limit
        self.window_size = window_size
        self.requests = deque()

    def allow_request(self):
        now = time.time()
        # Remove old requests
        while (self.requests and 
               self.requests[0] <= now - self.window_size):
            self.requests.popleft()

        # Check if limit exceeded
        if len(self.requests) < self.limit:
            self.requests.append(now)
            return True
        return False

Distributed Rate Limiting

⚠️ Challenges

Race Conditions:
Multiple instances updating counters simultaneously
Synchronization Latency:
Network delays in distributed counter updates
Partial Failures:
Some nodes may have stale counter information
Hot Partitioning:
Popular users creating hotspots in storage

βœ… Solutions

Centralized Storage:
Redis/Hazelcast for shared state
Sticky Sessions:
Route users to same instance
Approximate Counters:
Accept some inaccuracy for performance
Hierarchical Limits:
Global + local rate limits

πŸ—οΈ Distributed Architecture

Load Balancer
Route to instances
↓
API Instance 1
Local counters
API Instance 2
Local counters
API Instance N
Local counters
↕
Centralized Store (Redis)
Shared rate limit state

Rate Limiting Strategies

πŸ‘€ User-Based

Identification:
  • API keys
  • User authentication tokens
  • OAuth client IDs
  • Account identifiers
Advantages:
  • Fair allocation per user
  • Supports tiered pricing
  • Prevents single user abuse
Use Cases:
SaaS APIs, freemium models

🌐 IP-Based

Identification:
  • Source IP addresses
  • IP ranges/subnets
  • Geo-location based
  • ASN-based limits
Advantages:
  • Simple implementation
  • Works for anonymous users
  • DDoS protection
Challenges:
NAT, CDNs, shared IPs

πŸ”§ Resource-Based

Granularity:
  • Per endpoint/method
  • Per operation type
  • Per resource category
  • Compute cost-based
Advantages:
  • Protects expensive operations
  • Fine-grained control
  • Resource-aware limiting
Examples:
Search: 10 RPS, Upload: 1 RPS

Rate Limit Response Strategies

🚫 Hard Limits

Behavior:
Immediately reject requests that exceed limits
HTTP Response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640995200
Retry-After: 60
Use Cases:
  • Protection against abuse
  • Resource conservation
  • Cost control

🐌 Throttling

Behavior:
Slow down requests instead of rejecting them
Techniques:
  • Add artificial delays
  • Queue requests
  • Reduce response size
  • Lower quality results
Use Cases:
  • Better user experience
  • Gradual degradation
  • Revenue preservation

Monitoring Rate Limiting

πŸ“Š Key Metrics

  • β€’ Rate limit hit rate by endpoint
  • β€’ Top rate-limited users/IPs
  • β€’ Request acceptance vs rejection ratio
  • β€’ Average rate limit utilization
  • β€’ Peak usage patterns
  • β€’ Geographic distribution of limits hit
  • β€’ Cost per rate-limited request

🚨 Alerts

  • β€’ Spike in rate limit violations
  • β€’ Unusual traffic patterns
  • β€’ Rate limiting system failure
  • β€’ High rejection rates impacting revenue
  • β€’ Configuration changes
  • β€’ Storage system latency spikes
  • β€’ DDoS attack patterns

πŸ“ˆ Analytics

  • β€’ User behavior analysis
  • β€’ Optimal limit recommendations
  • β€’ Revenue impact of rate limiting
  • β€’ Conversion rates by rate limit tier
  • β€’ Seasonal traffic patterns
  • β€’ A/B testing rate limit values
  • β€’ Churn correlation with limits

Rate Limiting Best Practices

βœ… Do's

  • β€’ Use meaningful HTTP headers (X-RateLimit-*)
  • β€’ Implement graceful degradation
  • β€’ Provide clear error messages
  • β€’ Allow burst traffic with token bucket
  • β€’ Monitor rate limit effectiveness
  • β€’ Use different limits for different operations
  • β€’ Implement retry logic with exponential backoff
  • β€’ Consider user experience in limit design

❌ Don'ts

  • β€’ Don't set limits too aggressively
  • β€’ Don't ignore legitimate high-usage scenarios
  • β€’ Don't implement rate limiting without monitoring
  • β€’ Don't use the same limits for all users
  • β€’ Don't forget to handle distributed synchronization
  • β€’ Don't ignore the performance impact
  • β€’ Don't make error messages too cryptic
  • β€’ Don't rate limit health checks

Real-World Rate Limiting

🐦 Twitter API

Strategy: User + endpoint-based limits with time windows
Limits:
  • Tweet posting: 300 per 15 minutes
  • Following: 400 per 24 hours
  • Direct messages: 1000 per 24 hours
Implementation: Token bucket with user-specific buckets

πŸ—ΊοΈ Google Maps API

Strategy: Quota-based with different limits per API
Limits:
  • Geocoding: 50 RPS
  • Directions: 50 RPS
  • Places: 100 RPS
Features: Burst allowance, per-user quotas, billing integration

πŸ“§ SendGrid API

Strategy: Tiered limits based on subscription plan
Limits:
  • Free: 100 emails/day
  • Essentials: 40,000 emails/day
  • Pro: 120,000 emails/day
Features: Burst handling, detailed usage analytics

πŸ›’ Shopify API

Strategy: Leaky bucket with call credit system
Limits:
  • REST API: 2 RPS (40 call credit bucket)
  • GraphQL: Query complexity-based limits
Features: Call credit refill, complexity scoring

πŸ“ Rate Limiting Knowledge Quiz

1 of 5Current: 0/5

What is the primary purpose of rate limiting?