Retrieval-Augmented Generation (RAG)

Master RAG systems for building accurate, context-aware AI applications that combine retrieval and generation

35 min readAdvanced
Not Started
Loading...

RAG combines the power of information retrieval with language generation to provide accurate, contextual answers based on external knowledge sources.

Q&A Systems
Use Case
Most common enterprise AI application
30-50%
Accuracy Boost
Improvement over pure generation
60-80%
Hallucination Reduction
Grounded in real documents
Real-time
Knowledge Updates
No model retraining needed

RAG Architecture

Complete RAG system with retrieval, augmentation, and generation components

# Complete RAG System Implementation
import chromadb
from sentence_transformers import SentenceTransformer
import openai
from typing import List, Dict, Any
import numpy as np

class RAGSystem:
    def __init__(self, collection_name: str = "documents"):
        # Initialize vector database
        self.chroma_client = chromadb.Client()
        self.collection = self.chroma_client.create_collection(
            name=collection_name,
            metadata={"hnsw:space": "cosine"}
        )
        
        # Initialize embedding model
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        
        # Initialize LLM client
        self.llm_client = openai.OpenAI()
        
    def add_documents(self, documents: List[str], metadatas: List[Dict] = None):
        """Add documents to the vector store"""
        # Generate embeddings
        embeddings = self.embedding_model.encode(documents).tolist()
        
        # Generate IDs
        ids = [f"doc_{i}" for i in range(len(documents))]
        
        # Add to collection
        self.collection.add(
            documents=documents,
            embeddings=embeddings,
            metadatas=metadatas or [{}] * len(documents),
            ids=ids
        )
        
        print(f"Added {len(documents)} documents to the collection")
    
    def retrieve_documents(self, query: str, top_k: int = 5) -> List[Dict]:
        """Retrieve relevant documents for a query"""
        # Generate query embedding
        query_embedding = self.embedding_model.encode([query]).tolist()
        
        # Search in vector database
        results = self.collection.query(
            query_embeddings=query_embedding,
            n_results=top_k
        )
        
        # Format results
        retrieved_docs = []
        for i in range(len(results['documents'][0])):
            retrieved_docs.append({
                'content': results['documents'][0][i],
                'metadata': results['metadatas'][0][i],
                'distance': results['distances'][0][i],
                'id': results['ids'][0][i]
            })
        
        return retrieved_docs
    
    def generate_response(self, query: str, retrieved_docs: List[Dict]) -> str:
        """Generate response using LLM with retrieved context"""
        # Build context from retrieved documents
        context = "\n\n".join([
            f"Document {i+1}: {doc['content']}" 
            for i, doc in enumerate(retrieved_docs)
        ])
        
        # Create prompt with context
        prompt = f"""
        Context:
        {context}
        
        Question: {query}
        
        Please answer the question based on the provided context. If the context doesn't contain enough information to answer the question, please say so.
        """
        
        # Generate response
        response = self.llm_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=500
        )
        
        return response.choices[0].message.content
    
    def query(self, question: str, top_k: int = 5) -> Dict[str, Any]:
        """Complete RAG pipeline: retrieve and generate"""
        # Step 1: Retrieve relevant documents
        retrieved_docs = self.retrieve_documents(question, top_k)
        
        # Step 2: Generate response with context
        response = self.generate_response(question, retrieved_docs)
        
        return {
            'question': question,
            'answer': response,
            'retrieved_documents': retrieved_docs,
            'num_documents_retrieved': len(retrieved_docs)
        }

# Example usage
rag = RAGSystem("knowledge_base")

# Add documents
documents = [
    "Machine learning is a subset of artificial intelligence that enables computers to learn without being explicitly programmed.",
    "Deep learning uses neural networks with multiple layers to model and understand complex patterns in data.",
    "Natural language processing (NLP) is a field of AI that focuses on the interaction between computers and humans through natural language.",
    "Computer vision is a field of AI that trains computers to interpret and understand the visual world.",
    "Reinforcement learning is a type of machine learning where an agent learns to make decisions by taking actions in an environment."
]

rag.add_documents(documents)

# Query the system
result = rag.query("What is machine learning?")
print(f"Answer: {result['answer']}")

Key Features

  • Vector database for semantic search
  • Embedding model for document encoding
  • LLM for contextual response generation
  • Retrieval-augmented generation pipeline

🔄 RAG Pipeline Overview

1. Retrieval

Find relevant documents

  • Vector similarity search
  • Keyword matching
  • Hybrid approaches
  • Re-ranking results

2. Augmentation

Enhance the query context

  • Document chunking
  • Context window management
  • Prompt construction
  • Context filtering

3. Generation

Generate contextual response

  • LLM processing
  • Context-aware answers
  • Source attribution
  • Response formatting

⚠️ Common RAG Challenges & Solutions

Challenges

  • Retrieving irrelevant documents
  • Context window limitations
  • Inconsistent chunk quality
  • Slow retrieval performance
  • Outdated embeddings
  • Complex multi-hop questions

Solutions

  • Hybrid search (semantic + keyword)
  • Smart chunking strategies
  • Document pre-processing
  • Vector database optimization
  • Incremental index updates
  • Multi-step retrieval pipelines

📝 RAG Systems Quiz

1 of 8Current: 0/8

What are the three main components of a RAG system?