System Designer

🚦 Rate Limiting Impact Calculator

Incoming Requests per Second

Rate Limit (RPS)

Burst Capacity

Rate Limiting Algorithm

Window Size (seconds)

Number of Clients

Service Tier

📊 Traffic Analysis

Allowed: 502 RPS

Rejected: 498 RPS

Acceptance Rate: 50%

Rejection Rate: 50%

Burst Handled: 100 RPS

⚡ Performance Impact

Latency Overhead: 0.1 ms

CPU Overhead: 10%

Memory Usage: 10 MB

Storage Needed: 3 KB

Sync Latency: 1 ms

💼 Business Impact

Revenue Impact: -$4.98/sec

Consistency Level: 95%

Service Tier: standard

🚨 High Rejection Rate

Consider increasing rate limits or implementing queue-based throttling

Rate Limiting Fundamentals

Rate Limiting is a technique to control the rate of requests sent or received by a network interface controller. It's used to prevent abuse, ensure fair resource allocation, and maintain service quality.

🎯 Primary Goals

Prevent API abuse and DoS attacks
Ensure fair resource allocation among users
Maintain service stability under load
Protect downstream services
Monetize API usage through tiered limits
Comply with third-party API limits

🏗️ Key Components

Identifier: User, IP, API key, or service
Limit: Maximum requests per time window
Window: Time period for counting requests
Counter: Current request count
Action: Allow, deny, or throttle
Response: Headers and error messages

Rate Limiting Algorithms

🪣 Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes a token.

How it works:

Bucket holds up to N tokens
Tokens added at rate R per second
Each request consumes 1 token
Request denied if no tokens available

Advantages:

Allows burst traffic
Simple to implement
Memory efficient

Use cases:

API gateways, CDNs, traffic shaping

💧 Leaky Bucket

Requests are processed at a fixed rate, overflow requests are dropped.

How it works:

Requests enter bucket (queue)
Processed at fixed rate
Bucket has maximum capacity
Overflow requests are dropped

Advantages:

Smooth output rate
No burst traffic
Predictable behavior

Use cases:

Network QoS, streaming services, rate smoothing

🪟 Fixed Window

Count requests in fixed time windows (e.g., per minute).

How it works:

Define time window (e.g., 1 minute)
Count requests in current window
Reset counter at window boundary
Deny if counter exceeds limit

Problem:

Boundary effect - users can send 2x limit around window boundaries

Use cases:

Simple APIs, basic protection, logging systems

🎢 Sliding Window

Continuously track requests over a sliding time window.

How it works:

Maintain request timestamps
Remove old requests outside window
Count remaining requests
Deny if count exceeds limit

Advantages:

Accurate rate limiting
No boundary effects
Smooth operation

Disadvantages:

Higher memory usage, more complex implementation

Algorithm Comparison

Algorithm	Burst Traffic	Memory Usage	Accuracy	Implementation	Best For
Token Bucket	✅ Allowed	Low	Good	Simple	API gateways
Leaky Bucket	❌ Not allowed	Medium	High	Medium	QoS systems
Fixed Window	⚠️ Boundary issues	Very Low	Poor	Very Simple	Basic protection
Sliding Window	⚠️ Limited	High	Excellent	Complex	Precise control

Implementation Examples

🪣 Token Bucket (Redis)

Token Bucket Rate Limiter (Redis Lua Script)

-- Lua script for Redis
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokens = tonumber(ARGV[2])
local interval = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or 0

local now = redis.call('TIME')[1]
local elapsed = now - last_refill
current_tokens = math.min(capacity, 
    current_tokens + elapsed * tokens / interval)

if current_tokens >= 1 then
  current_tokens = current_tokens - 1
  redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', now)
  redis.call('EXPIRE', key, interval * 2)
  return 1
else
  return 0
end

🪟 Sliding Window (Python)

Sliding Window Rate Limiter Implementation

import time
from collections import deque

class SlidingWindowRateLimiter:
    def __init__(self, limit, window_size):
        self.limit = limit
        self.window_size = window_size
        self.requests = deque()

    def allow_request(self):
        now = time.time()
        # Remove old requests
        while (self.requests and 
               self.requests[0] <= now - self.window_size):
            self.requests.popleft()

        # Check if limit exceeded
        if len(self.requests) < self.limit:
            self.requests.append(now)
            return True
        return False

Distributed Rate Limiting

⚠️ Challenges

Race Conditions:

Multiple instances updating counters simultaneously

Synchronization Latency:

Network delays in distributed counter updates

Partial Failures:

Some nodes may have stale counter information

Hot Partitioning:

Popular users creating hotspots in storage

✅ Solutions

Centralized Storage:

Redis/Hazelcast for shared state

Sticky Sessions:

Route users to same instance

Approximate Counters:

Accept some inaccuracy for performance

Hierarchical Limits:

Global + local rate limits

🏗️ Distributed Architecture

Load Balancer

Route to instances

↓

API Instance 1

Local counters

API Instance 2

Local counters

API Instance N

Local counters

↕

Centralized Store (Redis)

Shared rate limit state

Rate Limiting Strategies

👤 User-Based

Identification:

API keys
User authentication tokens
OAuth client IDs
Account identifiers

Advantages:

Fair allocation per user
Supports tiered pricing
Prevents single user abuse

Use Cases:

SaaS APIs, freemium models

🌐 IP-Based

Identification:

Source IP addresses
IP ranges/subnets
Geo-location based
ASN-based limits

Advantages:

Simple implementation
Works for anonymous users
DDoS protection

Challenges:

NAT, CDNs, shared IPs

🔧 Resource-Based

Granularity:

Per endpoint/method
Per operation type
Per resource category
Compute cost-based

Advantages:

Protects expensive operations
Fine-grained control
Resource-aware limiting

Examples:

Search: 10 RPS, Upload: 1 RPS

Rate Limit Response Strategies

🚫 Hard Limits

Behavior:

Immediately reject requests that exceed limits

HTTP Response:

HTTP/1.1 429 Too Many Requests

X-RateLimit-Limit: 1000

X-RateLimit-Remaining: 0

X-RateLimit-Reset: 1640995200

Retry-After: 60

Use Cases:

Protection against abuse
Resource conservation
Cost control

🐌 Throttling

Behavior:

Slow down requests instead of rejecting them

Techniques:

Add artificial delays
Queue requests
Reduce response size
Lower quality results

Use Cases:

Better user experience
Gradual degradation
Revenue preservation

Monitoring Rate Limiting

📊 Key Metrics

• Rate limit hit rate by endpoint
• Top rate-limited users/IPs
• Request acceptance vs rejection ratio
• Average rate limit utilization
• Peak usage patterns
• Geographic distribution of limits hit
• Cost per rate-limited request

🚨 Alerts

• Spike in rate limit violations
• Unusual traffic patterns
• Rate limiting system failure
• High rejection rates impacting revenue
• Configuration changes
• Storage system latency spikes
• DDoS attack patterns

📈 Analytics

• User behavior analysis
• Optimal limit recommendations
• Revenue impact of rate limiting
• Conversion rates by rate limit tier
• Seasonal traffic patterns
• A/B testing rate limit values
• Churn correlation with limits

Rate Limiting Best Practices

✅ Do's

• Use meaningful HTTP headers (X-RateLimit-*)
• Implement graceful degradation
• Provide clear error messages
• Allow burst traffic with token bucket
• Monitor rate limit effectiveness
• Use different limits for different operations
• Implement retry logic with exponential backoff
• Consider user experience in limit design

❌ Don'ts

• Don't set limits too aggressively
• Don't ignore legitimate high-usage scenarios
• Don't implement rate limiting without monitoring
• Don't use the same limits for all users
• Don't forget to handle distributed synchronization
• Don't ignore the performance impact
• Don't make error messages too cryptic
• Don't rate limit health checks

Real-World Rate Limiting

🐦 Twitter API

Strategy: User + endpoint-based limits with time windows

Limits:

Tweet posting: 300 per 15 minutes
Following: 400 per 24 hours
Direct messages: 1000 per 24 hours

Implementation: Token bucket with user-specific buckets

🗺️ Google Maps API

Strategy: Quota-based with different limits per API

Limits:

Geocoding: 50 RPS
Directions: 50 RPS
Places: 100 RPS

Features: Burst allowance, per-user quotas, billing integration

📧 SendGrid API

Strategy: Tiered limits based on subscription plan

Limits:

Free: 100 emails/day
Essentials: 40,000 emails/day
Pro: 120,000 emails/day

Features: Burst handling, detailed usage analytics

🛒 Shopify API

Strategy: Leaky bucket with call credit system

Limits:

REST API: 2 RPS (40 call credit bucket)
GraphQL: Query complexity-based limits

Features: Call credit refill, complexity scoring

No quiz questions available

Questions prop is empty