Skip to main contentSkip to user menuSkip to navigation

Design Google Translate

Design a large-scale neural machine translation system supporting 108 languages with 1B+ daily requests, near-human quality for major pairs, and <200ms latency.

Neural MTMultilingualGlobal Scale
Q: What's the expected scale and global distribution?
A: 1 billion+ translation requests daily, serving users across all continents with peak loads during business hours in each timezone.
Engineering Implications: Massive global scale requires multi-region deployment with intelligent traffic routing. Average 11.6K QPS, peaks at 35K QPS. Need regional data centers and edge processing to meet latency requirements.
Q: How many languages and language pairs must we support?
A: Support 108 languages with bidirectional translation, creating 11,556 possible language pairs. Focus on quality tiers based on usage.
Engineering Implications: Combinatorial explosion problem. Can't train 11K+ separate models. Need shared multilingual architecture with quality stratification: Tier 1 (top 50 pairs, 80% traffic), Tier 2 (medium resource), Tier 3 (zero-shot).
Q: What are the quality and latency requirements by use case?
A: Web translation {'<'}200ms, document processing {'<'}500ms, real-time conversation {'<'}100ms. Near-human quality for major pairs, acceptable quality for rare pairs.
Engineering Implications: Different SLAs drive different architectures. Real-time needs aggressive caching and model optimization. Document translation can use larger, more accurate models. Quality expectations vary: EN-ES must be near-perfect, rare pairs just need to be understandable.
Q: What content types and modalities do we need to handle?
A: Text snippets, full documents (PDF/Word), web pages with HTML preservation, images with OCR, and real-time speech in conversations.
Engineering Implications: Multi-modal pipeline complexity. OCR for images, document parsing for format preservation, HTML structure maintenance, and streaming for real-time speech. Each modality has different preprocessing and quality requirements.
Q: What are the business and compliance requirements?
A: Revenue model based on API calls and premium features. GDPR compliance, data residency requirements, no persistent storage of user content.
Engineering Implications: Privacy-first design: ephemeral processing, no user data retention, regional compliance. API monetization drives cost optimization and usage analytics. Enterprise customers need dedicated instances and SLAs.

Complete Neural Machine Translation ML Systems Framework

🏗️ Section 8: Production ML System Design

  • • End-to-end flow: Client → CDN → API Gateway → Language Detection → Translation → Cache
  • • Multi-region deployment with intelligent traffic routing
  • • Dynamic model serving with batch optimization and quantization
  • • 85% cache hit rate with approximate matching and semantic similarity

⚖️ Key Trade-offs & Decisions

  • • Universal multilingual model vs. specialized bilingual models
  • • Quality tiers: Tier 1 (BLEU >45) vs. Tier 3 (BLEU >25) based on usage
  • • Latency vs. quality: 200ms SLA limits model size to 1.2B parameters
  • • Storage vs. compute: 100TB cache for 85% compute cost reduction

🔧 Implementation Challenges

  • • Combinatorial explosion: 11,556 language pairs with shared architecture
  • • Quality variance: High-resource vs. zero-shot translation quality
  • • Cultural adaptation: Context, formality, and regional variations
  • • Real-time constraints: Sub-200ms latency for web translation

🚀 Alternative Approaches

  • • Pivot translation: 2x latency but simpler architecture
  • • Bilingual models: Highest quality but 11K+ models to maintain
  • • Retrieval-augmented: Memory-based translation with phrase tables
  • • Federated learning: Privacy-preserving training across regions
No quiz questions available
Quiz ID "google-translate" not found

Interview Practice Questions

Practice these open-ended questions to prepare for system design interviews. Think through each scenario and discuss trade-offs.

1

Multilingual Model Architecture: Design a translation system supporting 108 languages with 11,556 language pairs using shared multilingual models. Address zero-shot translation, quality tiers based on data availability, and efficient model serving architecture.

2

Real-time Conversation Translation: Build a system for real-time conversation translation with <100ms latency. Include streaming speech recognition, incremental translation, voice synthesis, and handling of overlapping speech and interruptions.

3

Document Translation with Format Preservation: Design document translation for PDF, Word, PowerPoint files while maintaining formatting, layout, and embedded elements. Address OCR for scanned documents, table handling, and multi-column layouts.

4

Quality & Cultural Adaptation: Implement translation quality assessment and cultural adaptation across diverse languages. Include formality detection, regional variations, cultural context understanding, and continuous quality improvement based on user feedback.

5

Privacy-First Global Architecture: Build a globally distributed translation system meeting GDPR, regional data residency, and privacy requirements. Address ephemeral processing, audit logging, content filtering, and compliance across different jurisdictions.

6

Specialized Domain Translation: Handle specialized domains (medical, legal, technical) requiring domain-specific terminology and accuracy. Design domain detection, specialized model routing, terminology consistency, and expert validation workflows for critical translations.