Communication Patterns
Master the patterns and protocols that enable services to communicate effectively in distributed systems. Learn when to use synchronous vs asynchronous communication and how to handle failures gracefully.
Types of Service Communication
How services communicate fundamentally shapes your system's performance, reliability, and complexity. Each approach has distinct trade-offs that affect user experience and operational requirements.
Communication Design Principles
- • Loose Coupling: Services should depend on contracts, not implementations
- • Fault Tolerance: Assume network calls will fail and design accordingly
- • Performance: Choose patterns that match your latency and throughput requirements
- • Consistency: Decide between strong consistency and availability based on business needs
Synchronous (Request-Response)
Direct communication where sender waits for response
- • User-facing APIs
- • Real-time queries
- • Simple CRUD operations
- • Blocking calls
- • Cascading failures
- • Tight coupling
Asynchronous (Fire-and-Forget)
Sender doesn't wait for response, continues processing
- • Background processing
- • Event notifications
- • Data pipeline
- • Message ordering
- • Duplicate handling
- • Dead letter queues
Event-Driven Architecture
Services communicate through domain events and state changes
- • Microservices coordination
- • Real-time analytics
- • CQRS/Event Sourcing
- • Event schema evolution
- • Eventual consistency
- • Complex debugging
Performance Characteristics
Different communication patterns have vastly different performance characteristics. Choose based on your system's latency, throughput, and reliability requirements.
Synchronous (HTTP/gRPC)
Asynchronous (Queues)
Event-Driven
Common Messaging Patterns
These patterns provide proven solutions for common distributed system communication challenges. Each pattern addresses specific reliability, scalability, and consistency requirements.
Point-to-Point (Queue)
One producer sends to one consumer via queue
- • Simple model
- • Load balancing
- • Guaranteed processing
- • Single consumer bottleneck
- • No broadcast capability
Publish-Subscribe (Topic)
One producer sends to multiple interested consumers
- • Multiple consumers
- • Loose coupling
- • Easy to add new consumers
- • Potential duplicate delivery
- • Consumer management complexity
Request-Reply with Correlation
Asynchronous request with response correlation ID
- • Non-blocking
- • Timeout handling
- • Parallel processing
- • Correlation complexity
- • Response handling
- • State management
Saga Pattern
Coordinate long-running transactions across services
- • Distributed transactions
- • Failure recovery
- • Business process modeling
- • Complex implementation
- • Compensation logic
- • Debugging difficulty
Real-World Implementations
Learn how major tech companies implement communication patterns at scale. These examples show practical applications and lessons learned from production systems.
Netflix
1M+ events/secondVideo upload triggers encoding, thumbnail generation, metadata extraction, CDN distribution
- • Event schemas matter
- • Dead letter queues essential
- • Monitor everything
Uber
100K+ rides/minuteReal-time location updates (async) + ride matching logic (sync) + payment processing (async)
- • Latency vs consistency trade-offs
- • Circuit breakers prevent cascades
- • Idempotency is crucial
Slack
10M+ concurrent connectionsReal-time message delivery with guaranteed ordering and offline support
- • Connection management is hard
- • Message ordering matters
- • Graceful degradation
Shopify
1M+ orders/dayInventory check → Payment → Fulfillment → Shipping with full audit trail
- • Event sourcing enables replay
- • Compensation is complex
- • Testing sagas is critical
Error Handling & Resilience Patterns
Failure Handling
Circuit Breaker
Prevent cascading failures by failing fast when downstream services are unhealthy
Retry with Backoff
Automatically retry failed requests with increasing delays
Dead Letter Queue
Route messages that can't be processed to special queue for investigation
Data Consistency
Idempotency
Ensure operations can be safely retried without side effects
Outbox Pattern
Ensure database updates and message publishing happen atomically
Eventual Consistency
Accept temporary inconsistency for better availability and performance
Communication Technology Comparison
| Technology | Type | Latency | Throughput | Ordering | Best For |
|---|---|---|---|---|---|
| HTTP/REST | Sync | Medium | Medium | N/A | User-facing APIs, CRUD operations |
| gRPC | Sync | Low | High | N/A | Service-to-service, high performance |
| Apache Kafka | Async | Low | Very High | Strong | Event streaming, real-time analytics |
| RabbitMQ | Async | Medium | Medium | Optional | Task queues, workflow orchestration |
| WebSocket | Async | Very Low | Medium | Strong | Real-time updates, gaming, chat |
Communication Pattern Decision Guide
Use Synchronous When:
- ☐ User is waiting for response (interactive)
- ☐ Strong consistency required
- ☐ Simple request-response pattern
- ☐ Error handling needs to be immediate
- ☐ Low latency is critical
Use Asynchronous When:
- ☐ Background processing acceptable
- ☐ High throughput required
- ☐ Loose coupling preferred
- ☐ Fault tolerance is critical
- ☐ Multiple consumers needed
🎯 Communication Pattern Successes and Failures
Learn from real-world communication pattern decisions and their impact on system reliability
Metrics
Outcome
Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.
Lessons Learned
- Store-and-forward queuing essential for mobile messaging reliability
- Message acknowledgments at multiple levels (sent, delivered, read)
- Offline-first design: assume network is unreliable
- End-to-end encryption must work with asynchronous delivery patterns
ScenariosClick to explore
Context
How WhatsApp ensures message delivery across 2+ billion users
Metrics
Outcome
Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.
Key Lessons
- •Store-and-forward queuing essential for mobile messaging reliability
- •Message acknowledgments at multiple levels (sent, delivered, read)
- •Offline-first design: assume network is unreliable
- •End-to-end encryption must work with asynchronous delivery patterns
1. WhatsApp Message Delivery Reliability
Context
How WhatsApp ensures message delivery across 2+ billion users
Metrics
Outcome
Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.
Key Lessons
- •Store-and-forward queuing essential for mobile messaging reliability
- •Message acknowledgments at multiple levels (sent, delivered, read)
- •Offline-first design: assume network is unreliable
- •End-to-end encryption must work with asynchronous delivery patterns
2. Twitter Tweet Fanout Architecture
Context
Twitter's evolution from pull to push-based tweet delivery
Metrics
Outcome
Hybrid push-pull model: push tweets to followers' timelines for regular users, pull-based for celebrities with millions of followers to avoid fanout explosion.
Key Lessons
- •Pure push or pull models don't scale - hybrid approaches often optimal
- •Celebrity users require different communication patterns than regular users
- •Pre-computation (push) trades storage for read speed
- •Fan-out explosion problem real at scale - need circuit breakers
3. Netflix Video Streaming Coordination
Context
Netflix's event-driven architecture for video processing pipeline
Metrics
Outcome
Event-driven pipeline with Kafka: upload triggers encoding, thumbnail generation, metadata extraction, CDN distribution. Each step publishes completion events.
Key Lessons
- •Complex workflows benefit from event-driven choreography over orchestration
- •Each processing step should be idempotent and independently scalable
- •Dead letter queues essential for video processing failures
- •Event schemas must evolve gracefully as pipeline requirements change
4. Zoom Video Conferencing During COVID
Context
How Zoom scaled from 10M to 300M daily participants
Metrics
Outcome
WebRTC for peer-to-peer when possible, server routing for large groups. Real-time communication with graceful degradation (audio-only fallback).
Key Lessons
- •Real-time communication requires UDP-based protocols for low latency
- •Peer-to-peer vs server-mediated routing depends on group size
- •Graceful degradation critical: video → audio → text chat fallbacks
- •Auto-scaling video infrastructure extremely challenging due to stateful connections
5. Discord Server and Voice Chat Scale
Context
Discord's real-time communication architecture for gaming communities
Metrics
Outcome
WebSocket for text messages with guaranteed delivery, UDP for voice with real-time priority. Regional gateway distribution for low latency.
Key Lessons
- •Different communication types need different protocols and guarantees
- •Gaming requires ultra-low latency - sacrifice reliability for speed when needed
- •Regional gateway distribution essential for global real-time applications
- •Voice communication can tolerate packet loss but not latency
6. Airbnb Booking Saga Implementation
Context
Airbnb's distributed transaction handling for booking flow
Metrics
Outcome
Saga pattern coordinates: availability check → reservation hold → payment → confirmation → notification. Each step has compensation logic for failures.
Key Lessons
- •Saga pattern essential for multi-service transactions without 2PC
- •Compensation logic often more complex than the original operation
- •Timeout handling critical - cannot wait indefinitely for responses
- •Business logic must be designed with partial failure scenarios in mind