Circuit Breakers

Master the resilience pattern that prevents cascading failures in distributed systems

25 min read
Not Started

⚡ Circuit Breaker Impact Calculator

🔄 Circuit State

OPEN
🚫 Blocking requests to prevent cascading failures

📊 Request Flow

Requests Allowed: 100/sec
Requests Blocked: 900/sec
Successful: 85/sec
Failed: 15/sec
Throughput Efficiency: 9%

💾 System Impact

Resources Saved: 4500 CPU-sec
Cascade Prevention: 95%
User Experience: 40%
Error Budget Consumed: 2%
Error Budget Saved: 90%

Circuit Breaker Fundamentals

A Circuit Breaker is a resilience pattern that monitors failures and prevents calls to a service when it's likely to fail, allowing the system to recover and preventing cascading failures.

🎯 Key Problems Solved

  • Cascading failures in distributed systems
  • Resource exhaustion from repeated failed calls
  • Long response times due to timeouts
  • Thread pool exhaustion
  • Unnecessary load on failing services
  • Poor user experience during outages

⚡ Key Benefits

  • Fail fast instead of waiting for timeouts
  • Preserve system resources
  • Allow downstream services to recover
  • Provide fallback mechanisms
  • Improve overall system stability
  • Enable graceful degradation

Circuit Breaker States

🟢 CLOSED State

Normal operation - all requests pass through to the service
Behavior:
  • Requests flow normally to service
  • Monitor failure rate and response times
  • Count consecutive failures
  • Switch to OPEN when threshold exceeded
Metrics Tracked:
  • Success/failure counts
  • Response times
  • Timeout occurrences

🔴 OPEN State

Failure detected - requests fail immediately without calling service
Behavior:
  • Block all requests to service
  • Return cached response or error
  • Execute fallback logic
  • Wait for recovery timeout
Benefits:
  • Prevents cascade failures
  • Reduces resource usage
  • Allows service recovery

🟡 HALF-OPEN State

Testing recovery - limited requests allowed to test service health
Behavior:
  • Allow limited test requests
  • Monitor test request success
  • Switch to CLOSED if successful
  • Switch to OPEN if failures continue
Configuration:
  • Number of test requests
  • Success threshold
  • Test request timeout

State Transition Flow

CLOSED
Normal Operation
Failure threshold exceeded
OPEN
Failing Fast
Test requests succeed
HALF-OPEN
Testing Recovery
Recovery timeout reached
Test requests fail

Implementation Patterns

🔧 Basic Implementation

Basic Circuit Breaker Implementation
import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "CLOSED"
    OPEN = "OPEN"
    HALF_OPEN = "HALF_OPEN"

class CircuitOpenException(Exception):
    pass

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30, expected_exception=Exception):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.expected_exception = expected_exception
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if self._should_attempt_reset():
                self.state = CircuitState.HALF_OPEN
            else:
                raise CircuitOpenException("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except self.expected_exception as e:
            self._on_failure()
            raise e

    def _should_attempt_reset(self):
        return (time.time() - self.last_failure_time) >= self.recovery_timeout

    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

📊 Advanced Features

Sliding Window:
Track success/failure rates over time windows
Exponential Backoff:
Increase recovery timeout with repeated failures
Health Checks:
Separate health endpoint testing
Fallback Mechanisms:
Default responses or alternative services
Metrics Collection:
Detailed monitoring and alerting
Configuration Management:
Runtime threshold adjustments

Fallback Strategies

📦 Data Fallbacks

Cached Responses:
Return previously successful responses
Default Values:
Provide sensible defaults for missing data
Stale Data:
Return slightly outdated information
Static Content:
Serve pre-generated content
Example:
if circuit_open:
return cache.get('user_profile') or DEFAULT_PROFILE

🔄 Service Fallbacks

Alternative Services:
Route to backup service instances
Degraded Mode:
Provide reduced functionality
Queue for Later:
Store requests for delayed processing
User Notification:
Inform users about temporary issues
Example:
if circuit_open:
return backup_service.process_payment()
or queue_for_retry(payment_request)

Circuit Breaker Libraries

☕ Java/JVM

Hystrix (Netflix):
Pioneer library with comprehensive features (now in maintenance mode)
Resilience4j:
Modern, lightweight alternative to Hystrix
Failsafe:
Simple, flexible failure handling library
@CircuitBreaker(name = "payment-service")
public PaymentResult processPayment(Payment p) {
return paymentService.process(p);
}

🐍 Python

pybreaker:
Simple circuit breaker implementation
circuitbreaker:
Decorator-based circuit breaker
tenacity:
Retry library with circuit breaker support
@circuit_breaker(failure_threshold=5,
recovery_timeout=30)
def call_external_api():
return requests.get('https://api.example.com')

🌐 Node.js

opossum:
Full-featured circuit breaker for Node.js
cockatiel:
Resilience and transient-fault-handling library
const circuitBreaker = require('opossum');
const options = { timeout: 3000, errorThresholdPercentage: 50 };
const breaker = circuitBreaker(callExternalService, options);

🦀 Go

sony/gobreaker:
Simple and effective circuit breaker
rubyist/circuitbreaker:
Circuit breaker with various failure detectors
cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "ExternalAPI",
MaxRequests: 3,
Timeout: 30 * time.Second,
})

Monitoring Circuit Breakers

📊 Key Metrics

  • • State transitions (CLOSED → OPEN → HALF_OPEN)
  • • Failure rate percentage
  • • Request success/failure counts
  • • Response time percentiles
  • • Circuit open/close duration
  • • Fallback execution count
  • • Recovery attempt success rate

🚨 Alerts

  • • Circuit breaker opened
  • • High failure rate detected
  • • Recovery attempts failing
  • • Circuit stuck in half-open state
  • • Fallback mechanism activated
  • • Threshold configuration changes
  • • Service degradation detected

📈 Dashboards

  • • Real-time circuit state visualization
  • • Success/failure rate trends
  • • Response time distributions
  • • Service dependency health map
  • • Historical failure patterns
  • • Resource usage correlation
  • • Business impact metrics

Circuit Breaker Best Practices

✅ Do's

  • • Set appropriate failure thresholds based on service SLAs
  • • Implement meaningful fallback mechanisms
  • • Monitor circuit breaker metrics and trends
  • • Test circuit breaker behavior in staging
  • • Configure different timeouts for different operations
  • • Use exponential backoff for recovery attempts
  • • Implement proper logging for debugging
  • • Consider business impact when setting thresholds

❌ Don'ts

  • • Don't set failure thresholds too low (false positives)
  • • Don't ignore the need for fallback mechanisms
  • • Don't use circuit breakers for every single call
  • • Don't forget to handle circuit breaker exceptions
  • • Don't use the same configuration for all services
  • • Don't bypass circuit breakers in "urgent" situations
  • • Don't forget to test failure scenarios
  • • Don't overlook the cost of circuit breaker overhead

Real-World Use Cases

🎬 Netflix

Challenge: Millions of users streaming content with hundreds of microservices
Solution: Hystrix circuit breakers protecting critical paths
Results:
  • 99.99% uptime despite service failures
  • Graceful degradation during peak traffic
  • Fast recovery from cascade failures

🏦 Financial Trading

Challenge: High-frequency trading with strict latency requirements
Solution: Fast-fail circuit breakers with cached prices
Results:
  • Sub-millisecond fallback responses
  • Prevented market data feed cascades
  • Maintained trading during partner outages

🛒 E-commerce

Challenge: Payment processing during Black Friday traffic spikes
Solution: Circuit breakers with payment queue fallbacks
Results:
  • Maintained checkout flow during payment gateway issues
  • Processed queued payments after recovery
  • Preserved customer shopping sessions

📱 Social Media

Challenge: Real-time notifications across global user base
Solution: Circuit breakers protecting notification pipelines
Results:
  • Prevented notification storm cascades
  • Maintained core features during outages
  • Improved user experience with fallback content

📝 Circuit Breakers Knowledge Quiz

1 of 5Current: 0/5

What is the primary purpose of a circuit breaker pattern?