Design a Recommendation System - ML System Design

Complete ML system design walkthrough: from problem framing to production deployment.

For ML systems, focus on data availability, business objectives, and model constraints.

You:

What type of recommendations are we building? (products, content, ads, etc.)

Interviewer:

E-commerce product recommendations, similar to Amazon's 'Customers who bought X also bought Y' and personalized homepage.

💡 Your Analysis: Need both collaborative filtering and content-based approaches.
You:

What's the scale? Number of users and items?

Interviewer:

50 million active users, 100 million products, 1 billion interactions per day.

💡 Your Analysis: Large scale requires distributed training and serving infrastructure.
You:

What user interactions do we have access to?

Interviewer:

Views, clicks, add-to-cart, purchases, ratings (1-5 stars), search queries, time spent on page.

💡 Your Analysis: Rich implicit and explicit feedback - can use multi-objective learning.
You:

What's the business objective we're optimizing for?

Interviewer:

Primary: Increase conversion rate (purchases). Secondary: Increase average order value and user engagement.

💡 Your Analysis: Need to balance relevance with revenue - not just accuracy.
You:

Real-time or batch recommendations?

Interviewer:

Both. Real-time for homepage when user logs in, batch for email campaigns.

💡 Your Analysis: Hybrid system: pre-compute embeddings, real-time scoring.
You:

How fresh do recommendations need to be?

Interviewer:

Should reflect inventory changes within 1 hour, user behavior within 5 minutes.

💡 Your Analysis: Near real-time feature updates, periodic model retraining.
You:

Any constraints on recommendation diversity or fairness?

Interviewer:

Yes, avoid filter bubbles, ensure category diversity, and give new products a chance.

💡 Your Analysis: Need exploration strategies and diversity re-ranking.
You:

What's the latency requirement for serving?

Interviewer:

P99 < 100ms for API calls, including all personalization logic.

💡 Your Analysis: Requires extensive caching and optimized serving infrastructure.

🎯 ML System Design Interview Tips

ML-Specific Focus Areas:
  • ✅ Frame the business problem as ML objective
  • ✅ Discuss data quality and availability
  • ✅ Compare multiple model architectures
  • ✅ Address training-serving skew
  • ✅ Plan for model monitoring and retraining
  • ✅ Consider fairness and bias
Common Pitfalls:
  • ❌ Jumping to complex models too quickly
  • ❌ Ignoring data pipeline complexity
  • ❌ Forgetting about feature engineering
  • ❌ Not discussing offline vs online metrics
  • ❌ Overlooking model versioning
  • ❌ Missing production challenges