Design a News Feed (Twitter/Facebook)

Build a scalable social media timeline system with personalized content ranking, fan-out strategies, and real-time updates for hundreds of millions of users.

System Requirements

Functional Requirements

  • Users can post tweets (280 characters)
  • Users can follow other users
  • Generate personalized timeline/news feed
  • Users can like, retweet, and reply to tweets
  • Real-time notifications for interactions
  • Search functionality for tweets and users
  • Trending topics and hashtags
  • Media attachments (photos, videos)

Non-Functional Requirements

  • 300M daily active users
  • 500M tweets per day
  • Timeline load time < 200ms
  • 99.9% availability
  • Handle celebrity users with 50M+ followers
  • Global content distribution
  • Eventual consistency for timeline updates
  • Real-time updates for active users

Capacity Estimation

Usage Patterns

Read vs Write
1x ratioWrite
300x ratioRead
User Activity
80% usersPassive
20% usersActive
Timeline Sources
60% contentFollowing
40% contentAlgorithmic

Traffic Analysis

Daily Active Users
Peak: 150M concurrent
300M
Tweets per Day
Average: 1.7 per user
500M
Timeline Views
150 views per user
50B/day
Fan-out Factor
500M tweets → 75B deliveries
150x average

Infrastructure Requirements

Timeline Cache
50TB Redis for hot timelines
Tweet Storage
500TB total (tweets + metadata)
Processing Power
10K servers for fan-out processing

Fan-out Strategies

1

Fan-out on Write (Push Model)

Pre-compute timelines when tweets are posted

Pros

  • Fast reads
  • Simple read logic
  • Good for average users

Cons

  • Expensive writes for celebrities
  • Storage overhead
  • Stale data issues

Best For

Users with < 1M followers
2

Fan-out on Read (Pull Model)

Generate timelines on demand when users request them

Pros

  • No storage waste
  • Always fresh data
  • Handles celebrities well

Cons

  • Slow reads
  • Complex aggregation
  • Hot spotting issues

Best For

Celebrity users with millions of followers
3

Hybrid Model

Push for most users, pull for celebrities, merge at read time

Pros

  • Best of both worlds
  • Optimized performance
  • Scalable approach

Cons

  • Complex implementation
  • Merge complexity
  • Cache invalidation

Best For

Production systems (Twitter, Facebook, Instagram)

System Architecture

Write Path:
Mobile/Web → API Gateway → Tweet Service → Message Queue
Fan-out Service → Social Graph Service → Timeline Cache (Redis)
Notification Service → Push Notifications
Read Path:
Mobile/Web → API Gateway → Timeline Service
Timeline Cache (Redis) + Celebrity Tweet Service (for pull model)
Ranking Service → Personalized Timeline Response

Tweet Service

Java microservices, MySQL/PostgreSQL
Purpose:
Handle tweet creation, deletion, and metadata
Scale:
20K tweets/second peak

Fan-out Service

Kafka, Redis, async processing
Purpose:
Distribute tweets to follower timelines
Scale:
500M fanout operations/day

Timeline Service

Redis, Cassandra, ranking ML models
Purpose:
Generate and serve personalized timelines
Scale:
1M timeline requests/second

Social Graph Service

Graph database (Neo4j), Redis cache
Purpose:
Manage follow relationships and friend networks
Scale:
100M+ follow relationships

Content Ranking Algorithm

Content Quality

High Impact
Engagement rate (likes, retweets, replies)
Content freshness and recency
Media presence (images, videos)
Tweet length and readability

Social Signals

Very High Impact
Author relationship strength
Mutual connections
Author authority/verification
Historical interaction patterns

Personalization

High Impact
User interests and topics
Past engagement history
Time of day patterns
Device and location context

Trending & Viral

Medium Impact
Trending topic participation
Viral coefficient
Geographic relevance
Breaking news signals

Ranking Pipeline

Feature Extraction

• Real-time engagement metrics
• User interaction history
• Content semantic analysis
• Social graph signals

ML Scoring

• Gradient boosting models
• Deep learning embeddings
• Multi-task learning
• A/B testing framework

Timeline Assembly

• Score-based sorting
• Diversity injection
• Fresh content boosting
• Spam/quality filtering

Scaling Challenges & Solutions

1

Celebrity Problem

Problem: Users with 50M+ followers cause massive fan-out amplification
Solution: Hybrid approach: pull model for celebrities, push for regular users
Implementation: Threshold-based routing, separate celebrity tweet processing
2

Timeline Freshness

Problem: Balance between real-time updates and system performance
Solution: Tiered caching with different TTLs based on user activity
Implementation: Active users: 30s cache, inactive: 10min cache
3

Hot Partition Problem

Problem: Popular tweets create hotspots in timeline storage
Solution: Consistent hashing with virtual nodes and load balancing
Implementation: Partition by user_id with hot partition detection
4

Ranking at Scale

Problem: Real-time ML ranking for millions of timeline requests
Solution: Pre-computed features with lightweight real-time scoring
Implementation: Feature store + online inference with <50ms latency

Database Design

Tweet Schema

tweets table: - tweet_id (UUID, PK) - user_id (UUID, indexed) - content (text, 280 chars) - media_urls (JSON array) - created_at (timestamp) - reply_to_tweet_id (UUID, nullable) - retweet_of_tweet_id (UUID, nullable) - like_count (int, default 0) - retweet_count (int, default 0) - reply_count (int, default 0)

Timeline Cache Schema

Redis Timeline Cache: Key: "timeline:{user_id}" Value: Sorted Set (ZSET) - Score: timestamp - Member: tweet_id Key: "tweet:{tweet_id}" Value: Hash - content, user_id, created_at - like_count, retweet_count - media_urls, etc.

Social Graph Schema

follows table: - follower_id (UUID, PK) - followee_id (UUID, PK) - created_at (timestamp) - relationship_type (follow/close_friend) Indexes: - (follower_id, created_at) for "following" list - (followee_id, created_at) for "followers" list - Composite index for bidirectional lookup

Practice Questions

1

How would you handle a celebrity with 50M followers posting a tweet? What's the fan-out strategy?

2

Design the ranking algorithm. What features would you use to determine tweet relevance for a user?

3

How would you implement timeline pagination efficiently? Consider both performance and consistency.

4

Design a system to detect and handle trending topics in real-time. How do you identify viral content?

5

How would you implement real-time notifications for likes, retweets, and mentions without overwhelming users?