Retrieval-Augmented Generation (RAG)
Master RAG systems for building accurate, context-aware AI applications that combine retrieval and generation
35 min read•Advanced
Not Started
Loading...
RAG combines the power of information retrieval with language generation to provide accurate, contextual answers based on external knowledge sources.
Q&A Systems
Use Case
Most common enterprise AI application
30-50%
Accuracy Boost
Improvement over pure generation
60-80%
Hallucination Reduction
Grounded in real documents
Real-time
Knowledge Updates
No model retraining needed
RAG Architecture
Complete RAG system with retrieval, augmentation, and generation components
# Complete RAG System Implementation
import chromadb
from sentence_transformers import SentenceTransformer
import openai
from typing import List, Dict, Any
import numpy as np
class RAGSystem:
def __init__(self, collection_name: str = "documents"):
# Initialize vector database
self.chroma_client = chromadb.Client()
self.collection = self.chroma_client.create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"}
)
# Initialize embedding model
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Initialize LLM client
self.llm_client = openai.OpenAI()
def add_documents(self, documents: List[str], metadatas: List[Dict] = None):
"""Add documents to the vector store"""
# Generate embeddings
embeddings = self.embedding_model.encode(documents).tolist()
# Generate IDs
ids = [f"doc_{i}" for i in range(len(documents))]
# Add to collection
self.collection.add(
documents=documents,
embeddings=embeddings,
metadatas=metadatas or [{}] * len(documents),
ids=ids
)
print(f"Added {len(documents)} documents to the collection")
def retrieve_documents(self, query: str, top_k: int = 5) -> List[Dict]:
"""Retrieve relevant documents for a query"""
# Generate query embedding
query_embedding = self.embedding_model.encode([query]).tolist()
# Search in vector database
results = self.collection.query(
query_embeddings=query_embedding,
n_results=top_k
)
# Format results
retrieved_docs = []
for i in range(len(results['documents'][0])):
retrieved_docs.append({
'content': results['documents'][0][i],
'metadata': results['metadatas'][0][i],
'distance': results['distances'][0][i],
'id': results['ids'][0][i]
})
return retrieved_docs
def generate_response(self, query: str, retrieved_docs: List[Dict]) -> str:
"""Generate response using LLM with retrieved context"""
# Build context from retrieved documents
context = "\n\n".join([
f"Document {i+1}: {doc['content']}"
for i, doc in enumerate(retrieved_docs)
])
# Create prompt with context
prompt = f"""
Context:
{context}
Question: {query}
Please answer the question based on the provided context. If the context doesn't contain enough information to answer the question, please say so.
"""
# Generate response
response = self.llm_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=500
)
return response.choices[0].message.content
def query(self, question: str, top_k: int = 5) -> Dict[str, Any]:
"""Complete RAG pipeline: retrieve and generate"""
# Step 1: Retrieve relevant documents
retrieved_docs = self.retrieve_documents(question, top_k)
# Step 2: Generate response with context
response = self.generate_response(question, retrieved_docs)
return {
'question': question,
'answer': response,
'retrieved_documents': retrieved_docs,
'num_documents_retrieved': len(retrieved_docs)
}
# Example usage
rag = RAGSystem("knowledge_base")
# Add documents
documents = [
"Machine learning is a subset of artificial intelligence that enables computers to learn without being explicitly programmed.",
"Deep learning uses neural networks with multiple layers to model and understand complex patterns in data.",
"Natural language processing (NLP) is a field of AI that focuses on the interaction between computers and humans through natural language.",
"Computer vision is a field of AI that trains computers to interpret and understand the visual world.",
"Reinforcement learning is a type of machine learning where an agent learns to make decisions by taking actions in an environment."
]
rag.add_documents(documents)
# Query the system
result = rag.query("What is machine learning?")
print(f"Answer: {result['answer']}")
Key Features
- ✓Vector database for semantic search
- ✓Embedding model for document encoding
- ✓LLM for contextual response generation
- ✓Retrieval-augmented generation pipeline
🔄 RAG Pipeline Overview
1. Retrieval
Find relevant documents
- • Vector similarity search
- • Keyword matching
- • Hybrid approaches
- • Re-ranking results
2. Augmentation
Enhance the query context
- • Document chunking
- • Context window management
- • Prompt construction
- • Context filtering
3. Generation
Generate contextual response
- • LLM processing
- • Context-aware answers
- • Source attribution
- • Response formatting
⚠️ Common RAG Challenges & Solutions
Challenges
- ⚠Retrieving irrelevant documents
- ⚠Context window limitations
- ⚠Inconsistent chunk quality
- ⚠Slow retrieval performance
- ⚠Outdated embeddings
- ⚠Complex multi-hop questions
Solutions
- ✓Hybrid search (semantic + keyword)
- ✓Smart chunking strategies
- ✓Document pre-processing
- ✓Vector database optimization
- ✓Incremental index updates
- ✓Multi-step retrieval pipelines
📝 RAG Systems Quiz
1 of 8Current: 0/8