Design a Search Ranking System
Build a machine learning-powered search system that delivers highly relevant, personalized results at massive scale with real-time indexing and ranking.
System Requirements
Functional Requirements
- Real-time search with sub-second latency
- Relevance scoring with multiple signals
- Personalized search results per user
- Query auto-completion and suggestions
- Faceted search and filtering
- Search analytics and click tracking
- A/B testing for ranking algorithms
- Safe search and content filtering
Non-Functional Requirements
- Handle 100K+ queries per second
- Index 10B+ documents with real-time updates
- Sub-200ms query response time P95
- 99.9% search availability
- Support 50+ languages and locales
- Handle typos and fuzzy matching
- Scale to petabytes of indexed content
- Maintain relevance quality > 90%
Ranking Signal Architecture
Text Relevance
TF-IDF, BM25, semantic matching
Authority/Quality
Domain authority, page quality, freshness
User Behavior
Click-through rates, dwell time, bounce rate
Personalization
User history, preferences, location
Business Logic
Promoted content, partnerships, compliance
System Architecture Components
Query Processing
- • Query parsing & analysis
- • Intent classification
- • Query expansion
- • Spell correction
- • Auto-completion
Search Engine
- • Inverted index
- • Sharding & replication
- • Faceted search
- • Fuzzy matching
- • Caching layer
Ranking Engine
- • ML model serving
- • Feature extraction
- • Score combination
- • Personalization
- • A/B testing
Content Indexer
- • Web crawling
- • Content extraction
- • Duplicate detection
- • Quality scoring
- • Real-time updates
Analytics Engine
- • Click tracking
- • Query analytics
- • Performance monitoring
- • User behavior analysis
- • Relevance evaluation
User Profile Service
- • Search history
- • Preference learning
- • User embeddings
- • Privacy controls
- • Personalization
Capacity Estimation
Search Traffic & Performance
Performance Metrics
Infrastructure Requirements
Practice Questions
Design a learning-to-rank system that incorporates both content features and user behavior signals in real-time.
How would you handle query expansion and semantic search to improve recall for long-tail queries?
Design an A/B testing framework for search ranking algorithms that accounts for position bias and novelty effects.
How would you implement personalized search while maintaining user privacy and avoiding filter bubbles?
Design a real-time indexing pipeline that can handle billions of document updates while maintaining search consistency.