Skip to main contentSkip to user menuSkip to navigation

Design a Chat System (WhatsApp/Slack)

Design a real-time messaging system supporting 100M daily active users with 50M concurrent connections, sub-100ms message delivery, and 99.99% availability.

Real-timeWebSocketsMessage Ordering

1.1 Clarifying Questions & Requirements

Q: What's the expected scale and user behavior patterns?
A: Support 100M daily active users, 50M peak concurrent connections, 1B messages per day. Mix of 1:1 chats and group chats (up to 256 members). Peak usage during business hours and evenings.
Engineering Implications: Massive concurrent connections require efficient WebSocket management and horizontal scaling. Peak patterns drive caching strategies and infrastructure provisioning. Group chat fanout complexity grows with member count.
Q: What are the core messaging features and delivery requirements?
A: Real-time messaging with <100ms latency, message ordering within conversations, delivery receipts (sent/delivered/read), typing indicators, online presence, message history, search, and media sharing.
Engineering Implications: Real-time delivery needs WebSocket connections with fallback to push notifications. Message ordering requires careful sequence numbering and conflict resolution. Presence updates need efficient broadcasting mechanisms.
Q: What are the availability and consistency requirements?
A: 99.99% availability required (4.3 minutes downtime per month). Message ordering must be consistent within conversations. Eventual consistency acceptable for presence and read receipts.
Engineering Implications: High availability requires multi-region deployment with failover. Message ordering needs distributed consensus or careful partitioning. Different consistency levels optimize performance vs accuracy trade-offs.
Q: How should offline users and push notifications work?
A: Messages must be delivered to offline users via push notifications (APNs/FCM). Support message history sync when users come back online. Handle network disconnections gracefully with retry mechanisms.
Engineering Implications: Offline delivery requires durable message queuing and integration with platform notification services. Sync mechanisms need efficient protocols to handle message backlogs and prevent data loss during disconnections.
Q: What about media sharing, security, and compliance?
A: Support photos, videos, and files up to 100MB. End-to-end encryption for message content. GDPR compliance for user data. Content moderation for harmful content. Message retention policies (delete after 1 year).
Engineering Implications: Media storage needs CDN distribution and efficient compression. E2E encryption complicates search and moderation. Compliance requires data export/deletion capabilities and audit trails. Content moderation needs ML-based detection systems.

1.2 Back-of-the-Envelope Calculations

Traffic & Message Volume

Daily messages1B messages/day
= 11,574/sec avg, 50K/sec peak
Concurrent users50M peak connections
30% of 100M DAU online simultaneously
Message fanout2B operations/day
1:1 chats (1x) + group messages (avg 5x)
Presence updates500M updates/day
5 status changes per active user

Storage Requirements

Message size200 bytes avg
Text (100B) + metadata (100B)
Daily storage200GB/day
1B messages × 200 bytes
Media storage10TB/day
5% of messages contain media
Total 1-year4PB storage
Messages (73TB) + media (3.6PB) + indexes

Infrastructure Estimates

WebSocket servers5K instances
10K connections each = 50M capacity
Chat services1K instances
Stateless, horizontally scaled
Database cluster100 nodes
Cassandra cluster, RF=3
Message queue50 Kafka brokers
High-throughput event streaming

Monthly Operating Costs

Compute (API/WS)$1.8M/month
6K instances × $300/month
Database cluster$600K/month
100 nodes × $6K/month
Storage & CDN$200K/month
4PB storage + bandwidth
Total infrastructure$2.6M/month
Including networking, monitoring, push
No quiz questions available
Quiz ID "chat-system" not found

Interview Practice Questions

Practice these open-ended questions to prepare for system design interviews. Think through each scenario and discuss trade-offs.

1

Enterprise Chat Platform: Design a Slack-like enterprise chat platform supporting 100K+ organizations with channels, threads, file sharing, integrations, and enterprise security. Address multi-tenancy, compliance, search, and custom workflows.

2

Global Voice & Video Integration: Add voice calls, video conferencing, and screen sharing to the chat system. Handle 10M+ concurrent calls with global routing, quality optimization, recording, and real-time features like live captions.

3

AI-Powered Chat Features: Integrate AI for smart replies, language translation, content moderation, sentiment analysis, and chat summarization. Address real-time processing, model serving, privacy, and user control over AI features.

4

Offline-First Mobile Experience: Design offline messaging with conflict resolution, optimistic UI updates, media sync, and seamless online/offline transitions. Handle message ordering, duplicate prevention, and storage optimization.

5

Cross-Platform Protocol Design: Create a unified messaging protocol supporting web, mobile, desktop, IoT devices, and third-party integrations. Address versioning, backward compatibility, feature negotiation, and efficient serialization.

6

Real-time Collaboration Features: Add collaborative editing, shared whiteboards, live cursors, and co-browsing to chat. Handle operational transformations, conflict resolution, permissions, and 100ms latency requirements for smooth UX.