Circuit Breakers
Master the resilience pattern that prevents cascading failures in distributed systems
25 min read•
Not Started
⚡ Circuit Breaker Impact Calculator
🔄 Circuit State
OPEN
🚫 Blocking requests to prevent cascading failures
📊 Request Flow
Requests Allowed: 100/sec
Requests Blocked: 900/sec
Successful: 85/sec
Failed: 15/sec
Throughput Efficiency: 9%
💾 System Impact
Resources Saved: 4500 CPU-sec
Cascade Prevention: 95%
User Experience: 40%
Error Budget Consumed: 2%
Error Budget Saved: 90%
Circuit Breaker Fundamentals
A Circuit Breaker is a resilience pattern that monitors failures and prevents calls to a service when it's likely to fail, allowing the system to recover and preventing cascading failures.
🎯 Key Problems Solved
- Cascading failures in distributed systems
- Resource exhaustion from repeated failed calls
- Long response times due to timeouts
- Thread pool exhaustion
- Unnecessary load on failing services
- Poor user experience during outages
⚡ Key Benefits
- Fail fast instead of waiting for timeouts
- Preserve system resources
- Allow downstream services to recover
- Provide fallback mechanisms
- Improve overall system stability
- Enable graceful degradation
Circuit Breaker States
🟢 CLOSED State
Normal operation - all requests pass through to the service
Behavior:
- Requests flow normally to service
- Monitor failure rate and response times
- Count consecutive failures
- Switch to OPEN when threshold exceeded
Metrics Tracked:
- Success/failure counts
- Response times
- Timeout occurrences
🔴 OPEN State
Failure detected - requests fail immediately without calling service
Behavior:
- Block all requests to service
- Return cached response or error
- Execute fallback logic
- Wait for recovery timeout
Benefits:
- Prevents cascade failures
- Reduces resource usage
- Allows service recovery
🟡 HALF-OPEN State
Testing recovery - limited requests allowed to test service health
Behavior:
- Allow limited test requests
- Monitor test request success
- Switch to CLOSED if successful
- Switch to OPEN if failures continue
Configuration:
- Number of test requests
- Success threshold
- Test request timeout
State Transition Flow
CLOSED
Normal Operation
↓
Failure threshold exceeded
OPEN
Failing Fast
↑
Test requests succeed
HALF-OPEN
Testing Recovery
↑
Recovery timeout reached
Test requests fail
↺
Implementation Patterns
🔧 Basic Implementation
Basic Circuit Breaker Implementation
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "CLOSED"
OPEN = "OPEN"
HALF_OPEN = "HALF_OPEN"
class CircuitOpenException(Exception):
pass
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=30, expected_exception=Exception):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.expected_exception = expected_exception
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if self._should_attempt_reset():
self.state = CircuitState.HALF_OPEN
else:
raise CircuitOpenException("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except self.expected_exception as e:
self._on_failure()
raise e
def _should_attempt_reset(self):
return (time.time() - self.last_failure_time) >= self.recovery_timeout
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
📊 Advanced Features
Sliding Window:
Track success/failure rates over time windows
Exponential Backoff:
Increase recovery timeout with repeated failures
Health Checks:
Separate health endpoint testing
Fallback Mechanisms:
Default responses or alternative services
Metrics Collection:
Detailed monitoring and alerting
Configuration Management:
Runtime threshold adjustments
Fallback Strategies
📦 Data Fallbacks
Cached Responses:
Return previously successful responses
Default Values:
Provide sensible defaults for missing data
Stale Data:
Return slightly outdated information
Static Content:
Serve pre-generated content
Example:
if circuit_open:
return cache.get('user_profile') or DEFAULT_PROFILE
🔄 Service Fallbacks
Alternative Services:
Route to backup service instances
Degraded Mode:
Provide reduced functionality
Queue for Later:
Store requests for delayed processing
User Notification:
Inform users about temporary issues
Example:
if circuit_open:
return backup_service.process_payment()
or queue_for_retry(payment_request)
Circuit Breaker Libraries
☕ Java/JVM
Hystrix (Netflix):
Pioneer library with comprehensive features (now in maintenance mode)
Resilience4j:
Modern, lightweight alternative to Hystrix
Failsafe:
Simple, flexible failure handling library
@CircuitBreaker(name = "payment-service")
public PaymentResult processPayment(Payment p) {
return paymentService.process(p);
}
🐍 Python
pybreaker:
Simple circuit breaker implementation
circuitbreaker:
Decorator-based circuit breaker
tenacity:
Retry library with circuit breaker support
@circuit_breaker(failure_threshold=5,
recovery_timeout=30)
def call_external_api():
return requests.get('https://api.example.com')
🌐 Node.js
opossum:
Full-featured circuit breaker for Node.js
cockatiel:
Resilience and transient-fault-handling library
const circuitBreaker = require('opossum');
const options = { timeout: 3000, errorThresholdPercentage: 50 };
const breaker = circuitBreaker(callExternalService, options);
🦀 Go
sony/gobreaker:
Simple and effective circuit breaker
rubyist/circuitbreaker:
Circuit breaker with various failure detectors
cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "ExternalAPI",
MaxRequests: 3,
Timeout: 30 * time.Second,
})
Monitoring Circuit Breakers
📊 Key Metrics
- • State transitions (CLOSED → OPEN → HALF_OPEN)
- • Failure rate percentage
- • Request success/failure counts
- • Response time percentiles
- • Circuit open/close duration
- • Fallback execution count
- • Recovery attempt success rate
🚨 Alerts
- • Circuit breaker opened
- • High failure rate detected
- • Recovery attempts failing
- • Circuit stuck in half-open state
- • Fallback mechanism activated
- • Threshold configuration changes
- • Service degradation detected
📈 Dashboards
- • Real-time circuit state visualization
- • Success/failure rate trends
- • Response time distributions
- • Service dependency health map
- • Historical failure patterns
- • Resource usage correlation
- • Business impact metrics
Circuit Breaker Best Practices
✅ Do's
- • Set appropriate failure thresholds based on service SLAs
- • Implement meaningful fallback mechanisms
- • Monitor circuit breaker metrics and trends
- • Test circuit breaker behavior in staging
- • Configure different timeouts for different operations
- • Use exponential backoff for recovery attempts
- • Implement proper logging for debugging
- • Consider business impact when setting thresholds
❌ Don'ts
- • Don't set failure thresholds too low (false positives)
- • Don't ignore the need for fallback mechanisms
- • Don't use circuit breakers for every single call
- • Don't forget to handle circuit breaker exceptions
- • Don't use the same configuration for all services
- • Don't bypass circuit breakers in "urgent" situations
- • Don't forget to test failure scenarios
- • Don't overlook the cost of circuit breaker overhead
Real-World Use Cases
🎬 Netflix
Challenge: Millions of users streaming content with hundreds of microservices
Solution: Hystrix circuit breakers protecting critical paths
Results:
- 99.99% uptime despite service failures
- Graceful degradation during peak traffic
- Fast recovery from cascade failures
🏦 Financial Trading
Challenge: High-frequency trading with strict latency requirements
Solution: Fast-fail circuit breakers with cached prices
Results:
- Sub-millisecond fallback responses
- Prevented market data feed cascades
- Maintained trading during partner outages
🛒 E-commerce
Challenge: Payment processing during Black Friday traffic spikes
Solution: Circuit breakers with payment queue fallbacks
Results:
- Maintained checkout flow during payment gateway issues
- Processed queued payments after recovery
- Preserved customer shopping sessions
📱 Social Media
Challenge: Real-time notifications across global user base
Solution: Circuit breakers protecting notification pipelines
Results:
- Prevented notification storm cascades
- Maintained core features during outages
- Improved user experience with fallback content
📝 Circuit Breakers Knowledge Quiz
1 of 5Current: 0/5