Design a News Feed (Twitter/Facebook)
Build a scalable social media timeline system with personalized content ranking, fan-out strategies, and real-time updates for hundreds of millions of users.
System Requirements
Functional Requirements
- Users can post tweets (280 characters)
- Users can follow other users
- Generate personalized timeline/news feed
- Users can like, retweet, and reply to tweets
- Real-time notifications for interactions
- Search functionality for tweets and users
- Trending topics and hashtags
- Media attachments (photos, videos)
Non-Functional Requirements
- 300M daily active users
- 500M tweets per day
- Timeline load time < 200ms
- 99.9% availability
- Handle celebrity users with 50M+ followers
- Global content distribution
- Eventual consistency for timeline updates
- Real-time updates for active users
Capacity Estimation
Usage Patterns
Traffic Analysis
Infrastructure Requirements
Fan-out Strategies
Fan-out on Write (Push Model)
Pre-compute timelines when tweets are posted
Pros
- • Fast reads
- • Simple read logic
- • Good for average users
Cons
- • Expensive writes for celebrities
- • Storage overhead
- • Stale data issues
Best For
Fan-out on Read (Pull Model)
Generate timelines on demand when users request them
Pros
- • No storage waste
- • Always fresh data
- • Handles celebrities well
Cons
- • Slow reads
- • Complex aggregation
- • Hot spotting issues
Best For
Hybrid Model
Push for most users, pull for celebrities, merge at read time
Pros
- • Best of both worlds
- • Optimized performance
- • Scalable approach
Cons
- • Complex implementation
- • Merge complexity
- • Cache invalidation
Best For
System Architecture
Tweet Service
Java microservices, MySQL/PostgreSQLFan-out Service
Kafka, Redis, async processingTimeline Service
Redis, Cassandra, ranking ML modelsSocial Graph Service
Graph database (Neo4j), Redis cacheContent Ranking Algorithm
Content Quality
High ImpactSocial Signals
Very High ImpactPersonalization
High ImpactTrending & Viral
Medium ImpactRanking Pipeline
Feature Extraction
ML Scoring
Timeline Assembly
Scaling Challenges & Solutions
Celebrity Problem
Timeline Freshness
Hot Partition Problem
Ranking at Scale
Database Design
Tweet Schema
tweets table:
- tweet_id (UUID, PK)
- user_id (UUID, indexed)
- content (text, 280 chars)
- media_urls (JSON array)
- created_at (timestamp)
- reply_to_tweet_id (UUID, nullable)
- retweet_of_tweet_id (UUID, nullable)
- like_count (int, default 0)
- retweet_count (int, default 0)
- reply_count (int, default 0)
Timeline Cache Schema
Redis Timeline Cache:
Key: "timeline:{user_id}"
Value: Sorted Set (ZSET)
- Score: timestamp
- Member: tweet_id
Key: "tweet:{tweet_id}"
Value: Hash
- content, user_id, created_at
- like_count, retweet_count
- media_urls, etc.
Social Graph Schema
follows table:
- follower_id (UUID, PK)
- followee_id (UUID, PK)
- created_at (timestamp)
- relationship_type (follow/close_friend)
Indexes:
- (follower_id, created_at) for "following" list
- (followee_id, created_at) for "followers" list
- Composite index for bidirectional lookup
Practice Questions
How would you handle a celebrity with 50M followers posting a tweet? What's the fan-out strategy?
Design the ranking algorithm. What features would you use to determine tweet relevance for a user?
How would you implement timeline pagination efficiently? Consider both performance and consistency.
Design a system to detect and handle trending topics in real-time. How do you identify viral content?
How would you implement real-time notifications for likes, retweets, and mentions without overwhelming users?