What is Apache Lucene?
Apache Lucene is a powerful, full-featured text search engine library written in Java. Originally created by Doug Cutting, Lucene provides the foundation for many popular search applications including Elasticsearch, Apache Solr, and Amazon CloudSearch. It excels at indexing and searching large volumes of text data with sophisticated relevance scoring and query capabilities.
Lucene uses inverted indexes to achieve fast search performance, supporting complex queries, faceted search, highlighting, and real-time indexing. Its flexible architecture and extensive API make it the go-to choice for building custom search solutions that require fine-grained control over indexing and search behavior.
Lucene Performance Calculator
Memory Needed: 1619MB
Indexing Time: 3 min
Data Size: 4883GB
Lucene Core Components
Document & Fields
Core data model for representing searchable content.
• Field types (text, keyword, numeric)
• Index/store/analyze options
• Multi-valued fields
Analyzer
Text processing pipeline for indexing and searching.
• Lowercase filtering
• Stop word removal
• Stemming/lemmatization
IndexWriter
Manages document indexing and index updates.
• Segment management
• Commit strategies
• Real-time indexing
IndexSearcher
Executes queries and returns ranked results.
• Relevance scoring
• Result ranking
• Field retrieval
Lucene Query Types
Term and Boolean Queries
Basic building blocks for precise matching and logical combinations.
// Term query - exact match
TermQuery termQuery = new TermQuery(new Term("title", "lucene"));
// Boolean query - combine multiple conditions
BooleanQuery.Builder boolQuery = new BooleanQuery.Builder();
boolQuery.add(new TermQuery(new Term("title", "search")), BooleanClause.Occur.MUST);
boolQuery.add(new TermQuery(new Term("content", "engine")), BooleanClause.Occur.SHOULD);
Phrase and Proximity Queries
Search for exact phrases or terms within specified distances.
// Exact phrase query
PhraseQuery.Builder phraseQuery = new PhraseQuery.Builder();
phraseQuery.add(new Term("content", "apache"));
phraseQuery.add(new Term("content", "lucene"));
// Proximity query - terms within 5 positions
phraseQuery.setSlop(5);
Range and Wildcard Queries
Search within numeric ranges or use pattern matching.
// Numeric range query
Query rangeQuery = IntPoint.newRangeQuery("price", 100, 1000);
// Wildcard query - pattern matching
WildcardQuery wildcardQuery = new WildcardQuery(new Term("title", "search*"));
// Fuzzy query - edit distance
FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term("content", "lucene"), 2);
Real-World Lucene Implementations
Uses Lucene-based search across profiles, jobs, and content with 800M+ members.
- • People search with fuzzy matching
- • Job recommendation engine
- • Content discovery and news feed
- • Real-time indexing of profile updates
Powers real-time search across billions of tweets with custom Lucene optimizations.
- • Real-time tweet indexing
- • Trending topic detection
- • @ mention and hashtag search
- • Distributed search architecture
Stack Overflow
Elasticsearch (Lucene-based) powers search across 50M+ programming questions.
- • Code search with syntax highlighting
- • Tag-based filtering and faceting
- • Similar question recommendations
- • Full-text search across Q&A content
Wikipedia
Uses Lucene through Elasticsearch for searching across 6M+ articles in multiple languages.
- • Multi-language search support
- • Auto-complete suggestions
- • Category and infobox search
- • Cross-language search capabilities
Lucene Best Practices
✅ Do
- • Use appropriate analyzers for your content
- • Optimize field types (stored vs indexed)
- • Implement proper segment merging strategies
- • Use NRT (Near Real-Time) search when needed
- • Cache frequently used queries and filters
- • Monitor index size and performance metrics
❌ Don't
- • Create unnecessarily large documents
- • Use wildcard queries with leading wildcards
- • Ignore index optimization and cleanup
- • Store large binary data in index fields
- • Create too many small segments
- • Use string-based queries without proper escaping