What is FAISS?
FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta AI Research for efficient similarity search and clustering of dense vectors. Unlike managed vector databases such as Pinecone or Weaviate, FAISS gives you complete control over your infrastructure, allowing for extensive customization and optimization of large-scale similarity search workloads.
FAISS excels at handling billions of vectors with various index types optimized for different use cases - from exact search with Flat indices to highly compressed approximate search with Product Quantization. It supports both CPU and GPU implementations, making it ideal for production systems where you need maximum performance control and cost optimization.
FAISS Index Calculator
Raw Data: 2.9GB
Query Latency: 1000ms
Shards: 1
Monthly Cost: $227
FAISS Index Types
Flat Index
Exact brute-force search for small to medium datasets.
• O(n) search complexity
• Best for <100K vectors
• No index build time
• Memory = raw data size
IVF (Inverted File)
Clustering-based approximate search for large datasets.
• O(sqrt(n)) search complexity
• Best for 1M+ vectors
• Tunable nprobe parameter
• K-means clustering overhead
HNSW
Graph-based search with excellent speed-accuracy tradeoff.
• O(log n) search complexity
• Great for real-time queries
• Higher memory overhead
• Excellent recall performance
Product Quantization (PQ)
Compressed search for memory-constrained environments.
• 10-100x memory compression
• Best for billion+ vectors
• Lossy compression
• Fast compressed search
FAISS Implementation Examples
Basic FAISS Index Creation
Create and populate a FAISS index for similarity search.
import faiss
import numpy as np
# Generate sample vectors (1M vectors, 768 dimensions)
d = 768 # dimension
nb = 1000000 # database size
np.random.seed(1234)
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.
# Build index
index = faiss.IndexFlatL2(d) # L2 distance
print(index.is_trained) # True for Flat index
# Add vectors to index
index.add(xb)
print(index.ntotal) # 1000000
# Search
k = 4 # top-k results
xq = np.random.random((5, d)).astype('float32')
D, I = index.search(xq, k) # distances, indices
print(I[:5]) # neighbors of first 5 queries
IVF Index with nprobe Tuning
Create an IVF index for large-scale approximate search with accuracy tuning.
import faiss
# Create IVF index
d = 768
nlist = 1024 # number of clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)
# Train the index (clustering)
print(index.is_trained) # False - needs training
index.train(xb)
print(index.is_trained) # True after training
# Add vectors
index.add(xb)
# Search with different nprobe values
for nprobe in [1, 10, 100]:
index.nprobe = nprobe
start = time.time()
D, I = index.search(xq, k)
elapsed = time.time() - start
# Evaluate accuracy against ground truth
accuracy = evaluate_recall(I, ground_truth)
print(f"nprobe={nprobe}, time={elapsed:.3f}s, recall@{k}={accuracy:.3f}")
# Output example:
# nprobe=1, time=0.002s, recall@10=0.621
# nprobe=10, time=0.011s, recall@10=0.952
# nprobe=100, time=0.087s, recall@10=0.991
Production FAISS Service
Production-ready FAISS service with sharding and load balancing.
import faiss
from flask import Flask, request, jsonify
import numpy as np
import pickle
from concurrent.futures import ThreadPoolExecutor
class FAISSService:
def __init__(self, index_paths, shard_config):
self.shards = []
self.executor = ThreadPoolExecutor(max_workers=8)
# Load multiple index shards
for path in index_paths:
index = faiss.read_index(path)
if hasattr(index, 'nprobe'):
index.nprobe = shard_config.get('nprobe', 32)
self.shards.append(index)
def search_shard(self, shard_idx, query_vectors, k):
"""Search single shard"""
index = self.shards[shard_idx]
distances, indices = index.search(query_vectors, k)
# Adjust indices for global IDs
global_indices = indices + shard_idx * self.shard_size
return distances, global_indices
def search(self, query_vectors, k=10):
"""Parallel search across all shards"""
futures = []
# Submit search tasks for each shard
for i, shard in enumerate(self.shards):
future = self.executor.submit(
self.search_shard, i, query_vectors, k
)
futures.append(future)
# Collect results from all shards
all_distances, all_indices = [], []
for future in futures:
distances, indices = future.result()
all_distances.append(distances)
all_indices.append(indices)
# Merge and re-rank top-k results
return self.merge_results(all_distances, all_indices, k)
def merge_results(self, distances_list, indices_list, k):
"""Merge results from multiple shards"""
batch_size = distances_list[0].shape[0]
merged_distances = np.zeros((batch_size, k))
merged_indices = np.zeros((batch_size, k), dtype=np.int64)
for i in range(batch_size):
# Collect all candidates from all shards
candidates = []
for shard_distances, shard_indices in zip(distances_list, indices_list):
for j in range(k):
candidates.append((shard_distances[i,j], shard_indices[i,j]))
# Sort and take top-k
candidates.sort(key=lambda x: x[0])
for j in range(k):
merged_distances[i,j] = candidates[j][0]
merged_indices[i,j] = candidates[j][1]
return merged_distances, merged_indices
# Flask API wrapper
app = Flask(__name__)
faiss_service = FAISSService(
index_paths=['shard_0.index', 'shard_1.index'],
shard_config={'nprobe': 32}
)
@app.route('/search', methods=['POST'])
def search():
data = request.json
query_vectors = np.array(data['vectors'], dtype=np.float32)
k = data.get('k', 10)
distances, indices = faiss_service.search(query_vectors, k)
return jsonify({
'distances': distances.tolist(),
'indices': indices.tolist()
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Real-World FAISS Implementations
Meta (Facebook)
Uses FAISS for large-scale similarity search across billions of images and embeddings.
- • 10+ billion image embeddings indexed
- • Custom IVF implementations for photo search
- • GPU-accelerated FAISS for real-time inference
- • PQ compression for memory efficiency at scale
Spotify
Leverages FAISS for music recommendation and audio similarity search.
- • 50+ million song embeddings
- • HNSW indices for low-latency recommendations
- • Sharded deployment across multiple regions
- • Real-time playlist generation using FAISS search
Implements FAISS for visual search and content discovery across billions of pins.
- • 5+ billion pin embeddings for visual search
- • Custom index sharding and replication
- • Multi-GPU FAISS deployment for high throughput
- • A/B testing different index configurations
Instacart
Uses FAISS for product search and recommendation in grocery e-commerce.
- • 100+ million product embeddings
- • IVF indices optimized for product search
- • Real-time inventory-aware similarity search
- • Multi-modal search combining text and images
FAISS Deployment Strategies
Single Machine
Simple deployment for moderate scale workloads.
- • <10M vectors
- • Single index file
- • GPU acceleration
- • Simple to manage
Sharded Deployment
Horizontal scaling across multiple machines.
- • 10M-1B+ vectors
- • Load balancing
- • Result merging
- • Fault tolerance
Kubernetes
Cloud-native deployment with auto-scaling.
- • Container orchestration
- • Auto-scaling
- • Rolling updates
- • Resource management
FAISS Best Practices
✅ Do
- • Benchmark different index types for your use case
- • Use GPU acceleration for large-scale deployments
- • Implement proper index sharding for > 100M vectors
- • Monitor index build times and memory usage
- • Tune nprobe parameter based on accuracy requirements
- • Use PQ compression for memory-constrained environments
- • Implement proper error handling and fallbacks
- • Version your indices for safe deployments
❌ Don't
- • Use Flat index for millions of vectors
- • Ignore index build time in production planning
- • Set nprobe too low without accuracy validation
- • Forget to normalize vectors when using cosine similarity
- • Deploy without proper monitoring and alerting
- • Use FAISS for real-time updates (it's read-optimized)
- • Ignore memory fragmentation in long-running services
- • Mix different distance metrics in the same index