System Designer

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

10K+

Queries/sec per node

<50ms

Search latency

Index capacity per node

99.9%

Search availability

Key Features & Use Cases

Core Capabilities

Near Real-time Search

Fast indexing and search capabilities

Distributed Architecture

Automatic sharding and replication

Rich Query DSL

Complex queries with aggregations

RESTful API

JSON over HTTP interface

Common Use Cases

Application and website search

Log and event data analysis

Business intelligence and analytics

Infrastructure and security monitoring

Geospatial data analysis

Core Concepts

Index

A collection of documents with similar characteristics

Key Features

Schema mapping
Configurable settings
Multiple shards
Custom analyzers

Example

products, users, logs-2024-01, articles

🧮 Elasticsearch Cluster Calculator

Cluster Configuration

Nodes: 3

Primary Shards: 5

Replica Shards: 1

Docs per Shard: 1,000,000

Performance Metrics

Index Size29297 GB

Search QPS900

Indexing Rate12,000 docs/sec

Fault Tolerance1 failures

Can survive 1 node failure(s)

Search Patterns & Use Cases

Full-Text Search

Search across text content with relevance scoring

Query Example

{"match": {"description": {"query": "wireless headphones", "fuzziness": "AUTO"}}}

✅ Advantages

Natural language queries
Typo tolerance
Relevance scoring
Highlighting

⚠️ Considerations

Index size overhead
Analysis complexity
Language-specific needs

Architecture Patterns

Search-First Architecture

Primary Data Store

Elasticsearch as the main database for search-heavy applications

Use case: Content management, catalogs, documentation

Search-Only Layer

Dedicated search index with data synced from primary database

Use case: E-commerce, user directories, content discovery

Analytics Engine

Time-series data analysis and real-time dashboards

Use case: Log analysis, metrics, business intelligence

Integration Patterns

Change Data Capture

Real-time sync from database changes to search index

Database → Debezium → Kafka → Logstash → ES

Dual Write Pattern

Application writes to both primary DB and search index

App → PostgreSQL + Elasticsearch

Event-Driven Sync

Event-based eventual consistency with message queues

App → Events → Queue → Index Worker

🏢 Real-world Implementations

GitHub: Code Search

• 100+ million repositories indexed

• Complex code syntax analysis

• Real-time repository updates

• Advanced filtering and facets

Pattern: Specialized analyzers for code, real-time indexing pipeline

Airbnb: Search & Discovery

• 7+ million listings globally

• Geo-spatial search with filters

• ML-powered ranking signals

• Real-time availability updates

Pattern: Geo queries, ML ranking, faceted search with dynamic pricing

Netflix: Content Discovery

• Personalized content search

• Multi-language content indexing

• Real-time viewing analytics

• A/B testing search algorithms

Pattern: Personalization, multi-language, analytics-driven optimization

Uber: Operational Analytics

• Real-time trip and driver analytics

• Fraud detection and monitoring

• Operational dashboards

• Log aggregation across services

Pattern: Time-series analysis, real-time monitoring, multi-service logging

💡 Key Takeaways

• Index Design: Plan your mapping and analyzers carefully - changes require reindexing
• Shard Strategy: Right-size shards (10-50GB) and plan for growth patterns
• Query Performance: Use filters over queries when possible, cache frequent searches
• Data Modeling: Denormalize for search performance, consider parent-child relationships
• Monitoring: Watch cluster health, query latency, and indexing performance
• Scaling: Horizontal scaling through sharding, not just adding more nodes

No quiz questions available

Quiz ID "elasticsearch" not found