System Designer

What is FAISS?

FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta AI Research for efficient similarity search and clustering of dense vectors. Unlike managed vector databases such as Pinecone or Weaviate, FAISS gives you complete control over your infrastructure, allowing for extensive customization and optimization of large-scale similarity search workloads.

FAISS excels at handling billions of vectors with various index types optimized for different use cases - from exact search with Flat indices to highly compressed approximate search with Product Quantization. It supports both CPU and GPU implementations, making it ideal for production systems where you need maximum performance control and cost optimization.

FAISS Index Calculator

Vector Count: 1,000,000

Dimensions: 768

Index Type

Query Batch Size: 100

nprobe (clusters to search): 10

3.1GB

Index Memory

100

QPS

80.0%

Search Accuracy

40min

Build Time

Raw Data: 2.9GB

Query Latency: 1000ms

Shards: 1

Monthly Cost: $227

FAISS Index Types

Flat Index

Exact brute-force search for small to medium datasets.

• 100% accuracy (exact search)
• O(n) search complexity
• Best for <100K vectors
• No index build time
• Memory = raw data size

IVF (Inverted File)

Clustering-based approximate search for large datasets.

• 90-99% accuracy
• O(sqrt(n)) search complexity
• Best for 1M+ vectors
• Tunable nprobe parameter
• K-means clustering overhead

HNSW

Graph-based search with excellent speed-accuracy tradeoff.

• 95%+ accuracy
• O(log n) search complexity
• Great for real-time queries
• Higher memory overhead
• Excellent recall performance

Product Quantization (PQ)

Compressed search for memory-constrained environments.

• 80-90% accuracy
• 10-100x memory compression
• Best for billion+ vectors
• Lossy compression
• Fast compressed search

FAISS Implementation Examples

Basic FAISS Index Creation

Create and populate a FAISS index for similarity search.

Python FAISS Setup

import faiss
import numpy as np

# Generate sample vectors (1M vectors, 768 dimensions)
d = 768  # dimension
nb = 1000000  # database size
np.random.seed(1234)
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.

# Build index
index = faiss.IndexFlatL2(d)   # L2 distance
print(index.is_trained)       # True for Flat index

# Add vectors to index
index.add(xb)
print(index.ntotal)           # 1000000

# Search
k = 4  # top-k results
xq = np.random.random((5, d)).astype('float32')
D, I = index.search(xq, k)    # distances, indices
print(I[:5])                  # neighbors of first 5 queries

IVF Index with nprobe Tuning

Create an IVF index for large-scale approximate search with accuracy tuning.

IVF Index Implementation

import faiss

# Create IVF index
d = 768
nlist = 1024  # number of clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)

# Train the index (clustering)
print(index.is_trained)  # False - needs training
index.train(xb)
print(index.is_trained)  # True after training

# Add vectors
index.add(xb)

# Search with different nprobe values
for nprobe in [1, 10, 100]:
    index.nprobe = nprobe
    start = time.time()
    D, I = index.search(xq, k)
    elapsed = time.time() - start
    
    # Evaluate accuracy against ground truth
    accuracy = evaluate_recall(I, ground_truth)
    print(f"nprobe={nprobe}, time={elapsed:.3f}s, recall@{k}={accuracy:.3f}")

# Output example:
# nprobe=1, time=0.002s, recall@10=0.621
# nprobe=10, time=0.011s, recall@10=0.952  
# nprobe=100, time=0.087s, recall@10=0.991

Production FAISS Service

Production-ready FAISS service with sharding and load balancing.

Scalable FAISS Service

import faiss
from flask import Flask, request, jsonify
import numpy as np
import pickle
from concurrent.futures import ThreadPoolExecutor

class FAISSService:
    def __init__(self, index_paths, shard_config):
        self.shards = []
        self.executor = ThreadPoolExecutor(max_workers=8)
        
        # Load multiple index shards
        for path in index_paths:
            index = faiss.read_index(path)
            if hasattr(index, 'nprobe'):
                index.nprobe = shard_config.get('nprobe', 32)
            self.shards.append(index)
    
    def search_shard(self, shard_idx, query_vectors, k):
        """Search single shard"""
        index = self.shards[shard_idx]
        distances, indices = index.search(query_vectors, k)
        
        # Adjust indices for global IDs
        global_indices = indices + shard_idx * self.shard_size
        return distances, global_indices
    
    def search(self, query_vectors, k=10):
        """Parallel search across all shards"""
        futures = []
        
        # Submit search tasks for each shard
        for i, shard in enumerate(self.shards):
            future = self.executor.submit(
                self.search_shard, i, query_vectors, k
            )
            futures.append(future)
        
        # Collect results from all shards
        all_distances, all_indices = [], []
        for future in futures:
            distances, indices = future.result()
            all_distances.append(distances)
            all_indices.append(indices)
        
        # Merge and re-rank top-k results
        return self.merge_results(all_distances, all_indices, k)
    
    def merge_results(self, distances_list, indices_list, k):
        """Merge results from multiple shards"""
        batch_size = distances_list[0].shape[0]
        merged_distances = np.zeros((batch_size, k))
        merged_indices = np.zeros((batch_size, k), dtype=np.int64)
        
        for i in range(batch_size):
            # Collect all candidates from all shards
            candidates = []
            for shard_distances, shard_indices in zip(distances_list, indices_list):
                for j in range(k):
                    candidates.append((shard_distances[i,j], shard_indices[i,j]))
            
            # Sort and take top-k
            candidates.sort(key=lambda x: x[0])
            for j in range(k):
                merged_distances[i,j] = candidates[j][0]
                merged_indices[i,j] = candidates[j][1]
        
        return merged_distances, merged_indices

# Flask API wrapper
app = Flask(__name__)
faiss_service = FAISSService(
    index_paths=['shard_0.index', 'shard_1.index'], 
    shard_config={'nprobe': 32}
)

@app.route('/search', methods=['POST'])
def search():
    data = request.json
    query_vectors = np.array(data['vectors'], dtype=np.float32)
    k = data.get('k', 10)
    
    distances, indices = faiss_service.search(query_vectors, k)
    
    return jsonify({
        'distances': distances.tolist(),
        'indices': indices.tolist()
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Real-World FAISS Implementations

Meta (Facebook)

Uses FAISS for large-scale similarity search across billions of images and embeddings.

• 10+ billion image embeddings indexed
• Custom IVF implementations for photo search
• GPU-accelerated FAISS for real-time inference
• PQ compression for memory efficiency at scale

Spotify

Leverages FAISS for music recommendation and audio similarity search.

• 50+ million song embeddings
• HNSW indices for low-latency recommendations
• Sharded deployment across multiple regions
• Real-time playlist generation using FAISS search

Implements FAISS for visual search and content discovery across billions of pins.

• 5+ billion pin embeddings for visual search
• Custom index sharding and replication
• Multi-GPU FAISS deployment for high throughput
• A/B testing different index configurations

Instacart

Uses FAISS for product search and recommendation in grocery e-commerce.

• 100+ million product embeddings
• IVF indices optimized for product search
• Real-time inventory-aware similarity search
• Multi-modal search combining text and images

FAISS Deployment Strategies

Single Machine

Simple deployment for moderate scale workloads.

• <10M vectors
• Single index file
• GPU acceleration
• Simple to manage

Sharded Deployment

Horizontal scaling across multiple machines.

• 10M-1B+ vectors
• Load balancing
• Result merging
• Fault tolerance

Kubernetes

Cloud-native deployment with auto-scaling.

• Container orchestration
• Auto-scaling
• Rolling updates
• Resource management

FAISS Best Practices

✅ Do

• Benchmark different index types for your use case
• Use GPU acceleration for large-scale deployments
• Implement proper index sharding for > 100M vectors
• Monitor index build times and memory usage
• Tune nprobe parameter based on accuracy requirements
• Use PQ compression for memory-constrained environments
• Implement proper error handling and fallbacks
• Version your indices for safe deployments

❌ Don't

• Use Flat index for millions of vectors
• Ignore index build time in production planning
• Set nprobe too low without accuracy validation
• Forget to normalize vectors when using cosine similarity
• Deploy without proper monitoring and alerting
• Use FAISS for real-time updates (it's read-optimized)
• Ignore memory fragmentation in long-running services
• Mix different distance metrics in the same index

No quiz questions available

Questions prop is empty

FAISS