Design a Multi-Channel Notification System

Build a scalable notification platform that delivers billions of messages across multiple channels while respecting user preferences and ensuring reliable delivery.

System Requirements

Functional Requirements

  • Send multi-channel notifications (email, SMS, push, in-app)
  • Template management with personalization
  • User preference and subscription management
  • Scheduled and batch notifications
  • Priority-based delivery queuing
  • Real-time and transactional notifications
  • Delivery tracking and analytics
  • Unsubscribe and opt-out handling

Non-Functional Requirements

  • Send 100M+ notifications daily
  • Sub-second queuing latency
  • 99.9% delivery success rate
  • Handle 50K notifications/second peak
  • Support 100+ notification templates
  • Multi-region deployment for low latency
  • Complete audit trail for compliance
  • Graceful degradation during provider outages

Multi-Channel Delivery Strategy

Email

1M/hour
98% delivery
Latency: 1-5 min
Providers: SendGrid, AWS SES, Mailgun
Key Considerations:
SPF/DKIM setup
IP warming
Bounce handling
Spam scoring

SMS

100K/hour
95% delivery
Latency: 5-30 sec
Providers: Twilio, AWS SNS, MessageBird
Key Considerations:
Carrier filtering
Short codes
Cost optimization
Regional compliance

Push Notifications

5M/hour
90% delivery
Latency: < 1 sec
Providers: FCM, APNS, Web Push
Key Considerations:
Token management
Silent notifications
Rich media
Platform differences

In-App

10M/hour
100% delivery
Latency: Real-time
Providers: WebSocket, SSE, Long polling
Key Considerations:
Connection management
Offline sync
Read receipts
Badge counts

System Architecture Components

Notification Service

  • • API gateway
  • • Request validation
  • • Priority queuing
  • • Rate limiting
  • • Idempotency

Template Engine

  • • Template storage
  • • Personalization
  • • A/B testing
  • • Localization
  • • Version control

Queue Manager

  • • Priority queues
  • • Dead letter queues
  • • Retry logic
  • • Batch processing
  • • Circuit breakers

Channel Adapters

  • • Provider abstraction
  • • Failover logic
  • • Cost optimization
  • • Format conversion
  • • Delivery confirmation

Preference Service

  • • User preferences
  • • Opt-out management
  • • Quiet hours
  • • Channel routing
  • • Frequency capping

Analytics Service

  • • Delivery tracking
  • • Open/click rates
  • • Bounce handling
  • • Engagement metrics
  • • Campaign analytics

Capacity Estimation

Notification Volume & Distribution

Channel Distribution
45%Push
35%Email
Priority Levels
15%High Priority
85%Normal
Delivery Time
70%Immediate
30%Scheduled

Performance Metrics

Daily Notifications
Peak: 50K/sec
100M+
Queue Latency
P99: 1 second
< 500ms
Delivery Success
Across all channels
99.2%
Template Cache Hit
Redis cluster
95%

Infrastructure Requirements

Queue Infrastructure
Kafka: 50 brokers, 500TB storage
Processing Workers
1000+ containers auto-scaled
Storage
100TB for templates & analytics

Practice Questions

1

Design a multi-level rate limiting system that enforces limits per user, per channel, and per provider.

2

How would you ensure exactly-once delivery semantics across distributed workers and multiple retry attempts?

3

Design a template versioning system that supports A/B testing and gradual rollouts.

4

How would you handle provider outages and implement intelligent failover between notification providers?

5

Design a priority queue system that ensures critical notifications are delivered even during high load.