Communication Patterns

20 min read•Advanced

Not Started

Master the patterns and protocols that enable services to communicate effectively in distributed systems. Learn when to use synchronous vs asynchronous communication and how to handle failures gracefully.

Types of Service Communication

How services communicate fundamentally shapes your system's performance, reliability, and complexity. Each approach has distinct trade-offs that affect user experience and operational requirements.

Communication Design Principles

• Loose Coupling: Services should depend on contracts, not implementations
• Fault Tolerance: Assume network calls will fail and design accordingly
• Performance: Choose patterns that match your latency and throughput requirements
• Consistency: Decide between strong consistency and availability based on business needs

Synchronous (Request-Response)

Direct communication where sender waits for response

Latency

10-100ms

Complexity

Low

Reliability

Medium

Protocols

HTTP/REST, gRPC, GraphQL

Best Use Cases

• User-facing APIs
• Real-time queries
• Simple CRUD operations

Common Challenges

• Blocking calls
• Cascading failures
• Tight coupling

Asynchronous (Fire-and-Forget)

Sender doesn't wait for response, continues processing

Latency

1-10ms

Complexity

Medium

Reliability

High

Protocols

Message Queues, Event Streams, Webhooks

Best Use Cases

• Background processing
• Event notifications
• Data pipeline

Common Challenges

• Message ordering
• Duplicate handling
• Dead letter queues

Event-Driven Architecture

Services communicate through domain events and state changes

Latency

1-50ms

Complexity

High

Reliability

Very High

Protocols

Apache Kafka, AWS EventBridge, RabbitMQ

Best Use Cases

• Microservices coordination
• Real-time analytics
• CQRS/Event Sourcing

Common Challenges

• Event schema evolution
• Eventual consistency
• Complex debugging

Performance Characteristics

Different communication patterns have vastly different performance characteristics. Choose based on your system's latency, throughput, and reliability requirements.

Response Time

100mssync HTTP

5msasync queue

Throughput

1000 req/secsync HTTP

50000 req/secasync queue

Fault Tolerance

60%sync HTTP

95%async queue

Synchronous (HTTP/gRPC)

Latency

Network roundtrip + processing

Medium

Throughput

Blocking calls reduce concurrency

Limited

Debugging

Request/response correlation

Easy

Asynchronous (Queues)

Latency

Non-blocking, immediate return

Low

Throughput

Parallel processing, buffering

High

Debugging

Eventual consistency, correlation

Hard

Event-Driven

Latency

Depends on event processing

Variable

Throughput

Stream processing, parallelism

Very High

Debugging

Complex event flows, timing

Very Hard

Common Messaging Patterns

These patterns provide proven solutions for common distributed system communication challenges. Each pattern addresses specific reliability, scalability, and consistency requirements.

Point-to-Point (Queue)

One producer sends to one consumer via queue

Example Flow

Order processing: Order Service → Payment Queue → Payment Service

Guarantees

Exactly-once delivery to single consumer

Pros

• Simple model
• Load balancing
• Guaranteed processing

Cons

• Single consumer bottleneck
• No broadcast capability

Best for: Background jobs, task processing

Publish-Subscribe (Topic)

One producer sends to multiple interested consumers

Example Flow

User registration: User Service → User Created Event → Email, Analytics, Billing

Guarantees

At-least-once delivery to all subscribers

Pros

• Multiple consumers
• Loose coupling
• Easy to add new consumers

Cons

• Potential duplicate delivery
• Consumer management complexity

Best for: Event notifications, real-time updates

Request-Reply with Correlation

Asynchronous request with response correlation ID

Example Flow

Price calculation: Order Service → Pricing Request → Pricing Response

Guarantees

Response matched to original request

Pros

• Non-blocking
• Timeout handling
• Parallel processing

Cons

• Correlation complexity
• Response handling
• State management

Best for: Long-running computations, external API calls

Saga Pattern

Coordinate long-running transactions across services

Example Flow

Booking flow: Reserve Hotel → Reserve Flight → Charge Card (with compensations)

Guarantees

Either all steps complete or compensate

Pros

• Distributed transactions
• Failure recovery
• Business process modeling

Cons

• Complex implementation
• Compensation logic
• Debugging difficulty

Best for: Multi-service transactions, business workflows

Real-World Implementations

Learn how major tech companies implement communication patterns at scale. These examples show practical applications and lessons learned from production systems.

Netflix

1M+ events/second

Video Streaming Pipeline

Video upload triggers encoding, thumbnail generation, metadata extraction, CDN distribution

Pattern & Tech

Event-driven with Kafka

Apache KafkaAWS SQSHystrix Circuit Breaker

Key Lessons

• Event schemas matter
• Dead letter queues essential
• Monitor everything

Uber

100K+ rides/minute

Ride Matching System

Real-time location updates (async) + ride matching logic (sync) + payment processing (async)

Pattern & Tech

Hybrid sync/async

Apache KafkaRedis StreamsgRPC

Key Lessons

• Latency vs consistency trade-offs
• Circuit breakers prevent cascades
• Idempotency is crucial

Slack

10M+ concurrent connections

Message Delivery

Real-time message delivery with guaranteed ordering and offline support

Pattern & Tech

WebSocket + Message Queues

WebSocketApache PulsarRedis

Key Lessons

• Connection management is hard
• Message ordering matters
• Graceful degradation

Shopify

1M+ orders/day

Order Processing

Inventory check → Payment → Fulfillment → Shipping with full audit trail

Pattern & Tech

Saga with Event Sourcing

EventStoreRabbitMQGraphQL

Key Lessons

• Event sourcing enables replay
• Compensation is complex
• Testing sagas is critical

Error Handling & Resilience Patterns

Failure Handling

Circuit Breaker

Prevent cascading failures by failing fast when downstream services are unhealthy

Example: After 5 failures in 30s, open circuit for 60s

Retry with Backoff

Automatically retry failed requests with increasing delays

Example: Retry after 1s, 2s, 4s, 8s, then give up

Dead Letter Queue

Route messages that can't be processed to special queue for investigation

Example: After 3 failed processing attempts, move to DLQ

Data Consistency

Idempotency

Ensure operations can be safely retried without side effects

Example: Use unique request IDs to prevent duplicate processing

Outbox Pattern

Ensure database updates and message publishing happen atomically

Example: Write message to database table, separate process publishes

Eventual Consistency

Accept temporary inconsistency for better availability and performance

Example: User sees their own posts immediately, others see eventually

Communication Technology Comparison

Technology	Type	Latency	Throughput	Ordering	Best For
HTTP/REST	Sync	Medium	Medium	N/A	User-facing APIs, CRUD operations
gRPC	Sync	Low	High	N/A	Service-to-service, high performance
Apache Kafka	Async	Low	Very High	Strong	Event streaming, real-time analytics
RabbitMQ	Async	Medium	Medium	Optional	Task queues, workflow orchestration
WebSocket	Async	Very Low	Medium	Strong	Real-time updates, gaming, chat

Communication Pattern Decision Guide

Use Synchronous When:

☐ User is waiting for response (interactive)
☐ Strong consistency required
☐ Simple request-response pattern
☐ Error handling needs to be immediate
☐ Low latency is critical

Use Asynchronous When:

☐ Background processing acceptable
☐ High throughput required
☐ Loose coupling preferred
☐ Fault tolerance is critical
☐ Multiple consumers needed

🎯 Communication Pattern Successes and Failures

Learn from real-world communication pattern decisions and their impact on system reliability

Metrics

Daily Messages

100+ billion messages

Delivery Success

99.9% success rate

Pattern Used

Store-and-forward queuing

Offline Support

30-day message retention

Outcome

Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.

Lessons Learned

Store-and-forward queuing essential for mobile messaging reliability
Message acknowledgments at multiple levels (sent, delivered, read)
Offline-first design: assume network is unreliable
End-to-end encryption must work with asynchronous delivery patterns

ScenariosClick to explore

WhatsApp Message Delivery Reliability

How WhatsApp ensures message delivery across 2+ billion users

Twitter Tweet Fanout Architecture

Twitter's evolution from pull to push-based tweet delivery

Netflix Video Streaming Coordination

Netflix's event-driven architecture for video processing pipeline

Zoom Video Conferencing During COVID

How Zoom scaled from 10M to 300M daily participants

Discord Server and Voice Chat Scale

Discord's real-time communication architecture for gaming communities

Airbnb Booking Saga Implementation

Airbnb's distributed transaction handling for booking flow

Context

How WhatsApp ensures message delivery across 2+ billion users

Metrics

Daily Messages

100+ billion messages

Delivery Success

99.9% success rate

Pattern Used

Store-and-forward queuing

Offline Support

30-day message retention

Outcome

Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.

Key Lessons

•Store-and-forward queuing essential for mobile messaging reliability
•Message acknowledgments at multiple levels (sent, delivered, read)
•Offline-first design: assume network is unreliable
•End-to-end encryption must work with asynchronous delivery patterns

1. WhatsApp Message Delivery Reliability

Context

How WhatsApp ensures message delivery across 2+ billion users

Metrics

Daily Messages

100+ billion messages

Delivery Success

99.9% success rate

Pattern Used

Store-and-forward queuing

Offline Support

30-day message retention

Outcome

Asynchronous message queuing with persistent storage enables reliable delivery even when recipients are offline. Messages stored until successful delivery.

Key Lessons

•Store-and-forward queuing essential for mobile messaging reliability
•Message acknowledgments at multiple levels (sent, delivered, read)
•Offline-first design: assume network is unreliable
•End-to-end encryption must work with asynchronous delivery patterns

2. Twitter Tweet Fanout Architecture

Context

Twitter's evolution from pull to push-based tweet delivery

Metrics

Tweet Volume

500M tweets/day

Timeline Generation

Pre-computed for most users

Celebrity Tweets

Pull-based for high followers

Latency Improvement

10x faster timelines

Outcome

Hybrid push-pull model: push tweets to followers' timelines for regular users, pull-based for celebrities with millions of followers to avoid fanout explosion.

Key Lessons

•Pure push or pull models don't scale - hybrid approaches often optimal
•Celebrity users require different communication patterns than regular users
•Pre-computation (push) trades storage for read speed
•Fan-out explosion problem real at scale - need circuit breakers

3. Netflix Video Streaming Coordination

Context

Netflix's event-driven architecture for video processing pipeline

Metrics

Video Processing

1000+ hours uploaded/day

Encoding Formats

100+ different outputs

Pipeline Reliability

99.9% success rate

Processing Time

90% within 4 hours

Outcome

Event-driven pipeline with Kafka: upload triggers encoding, thumbnail generation, metadata extraction, CDN distribution. Each step publishes completion events.

Key Lessons

•Complex workflows benefit from event-driven choreography over orchestration
•Each processing step should be idempotent and independently scalable
•Dead letter queues essential for video processing failures
•Event schemas must evolve gracefully as pipeline requirements change

4. Zoom Video Conferencing During COVID

Context

How Zoom scaled from 10M to 300M daily participants

Metrics

Peak Users

300M daily participants

Growth Rate

30x increase in 4 months

Latency Requirement

<150ms end-to-end

Availability

99.99% during peak usage

Outcome

WebRTC for peer-to-peer when possible, server routing for large groups. Real-time communication with graceful degradation (audio-only fallback).

Key Lessons

•Real-time communication requires UDP-based protocols for low latency
•Peer-to-peer vs server-mediated routing depends on group size
•Graceful degradation critical: video → audio → text chat fallbacks
•Auto-scaling video infrastructure extremely challenging due to stateful connections

5. Discord Server and Voice Chat Scale

Context

Discord's real-time communication architecture for gaming communities

Metrics

Concurrent Voice Users

4M+ simultaneous

Message Latency

<50ms globally

Voice Latency

<40ms voice transmission

Architecture

WebSocket + UDP hybrid

Outcome

WebSocket for text messages with guaranteed delivery, UDP for voice with real-time priority. Regional gateway distribution for low latency.

Key Lessons

•Different communication types need different protocols and guarantees
•Gaming requires ultra-low latency - sacrifice reliability for speed when needed
•Regional gateway distribution essential for global real-time applications
•Voice communication can tolerate packet loss but not latency

6. Airbnb Booking Saga Implementation

Context

Airbnb's distributed transaction handling for booking flow

Metrics

Booking Steps

7 service interactions

Success Rate

98.5% booking completion

Rollback Cases

1.5% require compensation

Average Duration

2.3 seconds end-to-end

Outcome

Saga pattern coordinates: availability check → reservation hold → payment → confirmation → notification. Each step has compensation logic for failures.

Key Lessons

•Saga pattern essential for multi-service transactions without 2PC
•Compensation logic often more complex than the original operation
•Timeout handling critical - cannot wait indefinitely for responses
•Business logic must be designed with partial failure scenarios in mind

No quiz questions available

Quiz ID "communication-patterns" not found