🔍 RAG Fundamentals: Retrieval-Augmented Generation
import ContributionButtons from ’../../../../components/ContributionButtons.astro’; import UsageTracker from ’../../../../components/UsageTracker.astro’; import AuthorshipBadge from ’../../../../components/AuthorshipBadge.astro’; import GreaterGoodBadge from ’../../../../components/GreaterGoodBadge.astro’; import CookbookAsCode from ’../../../../components/CookbookAsCode.astro’; import InteractiveQuiz from ’../../../../components/InteractiveQuiz.astro’; import UnderstandingButton from ’../../../../components/UnderstandingButton.astro’; import LearningPath from ’../../../../components/LearningPath.astro’;
🌱 Seedling Concept
Section titled “🌱 Seedling Concept”Label: Grounding AI in Real Data
RAG (Retrieval-Augmented Generation) transforms AI agents from creative storytellers into accurate information providers by grounding their responses in retrieved, factual data.
What is RAG?
Section titled “What is RAG?”The Core Idea
Section titled “The Core Idea”Instead of relying solely on the model’s training data, RAG:
- Retrieves relevant information from a knowledge base
- Augments the model’s prompt with retrieved context
- Generates a response grounded in actual data
User Query: "What's our return policy?" │ ▼ [Similarity Search] │ ▼ [Retrieve Policy Documents] │ ▼ [Augment Prompt with Context] │ ▼ [Generate Grounded Response]💡 Why RAG Matters
Section titled “💡 Why RAG Matters”Without RAG
Section titled “Without RAG”# Model might hallucinate or provide outdated informationquery = "What's our return policy?"response = model.generate(query)# Response: "I believe you have 30 days..." (could be wrong)With RAG
Section titled “With RAG”# Model grounds response in actual documentsquery = "What's our return policy?"relevant_docs = retrieve(query, knowledge_base)context = f"Policy documents: {relevant_docs}"response = model.generate(query, context=context)# Response: "According to your current policy, customers have 45 days..."🌿 RAG Architecture
Section titled “🌿 RAG Architecture”Three-Stage Pipeline
Section titled “Three-Stage Pipeline”┌─────────────────────────────────────────────────┐│ Stage 1: Indexing │├─────────────────────────────────────────────────┤│ Document → Chunks → Embeddings → Vector Store │└─────────────────────────────────────────────────┘ ↓┌─────────────────────────────────────────────────┐│ Stage 2: Retrieval │├─────────────────────────────────────────────────┤│ Query → Embedding → Similarity Search → Top-K │└─────────────────────────────────────────────────┘ ↓┌─────────────────────────────────────────────────┐│ Stage 3: Generation │├─────────────────────────────────────────────────┤│ Context + Query → LLM → Grounded Response │└─────────────────────────────────────────────────┘🔬 Deep Dive: Building a RAG System
Section titled “🔬 Deep Dive: Building a RAG System”Step 1: Generating Embeddings and Indexing
Section titled “Step 1: Generating Embeddings and Indexing”Converting data into vector embeddings for semantic search:
from typing import Listimport numpy as np
class EmbeddingGenerator: """ Converts text into vector embeddings """ def __init__(self, model_name="text-embedding-004"): self.embedding_model = load_embedding_model(model_name)
def generate_embeddings(self, texts: List[str]) -> np.ndarray: """ Generate embeddings for list of texts """ embeddings = self.embedding_model.embed(texts) return embeddings
class DocumentIndexer: """ Indexes documents for retrieval """ def __init__(self): self.embedder = EmbeddingGenerator() self.chunks = [] self.embeddings = []
def chunk_document(self, document: str, chunk_size: int = 500) -> List[str]: """ Split document into manageable chunks """ words = document.split() chunks = []
for i in range(0, len(words), chunk_size): chunk = ' '.join(words[i:i + chunk_size]) chunks.append(chunk)
return chunks
def index_documents(self, documents: List[str]): """ Process and index documents """ for doc in documents: # Split into chunks doc_chunks = self.chunk_document(doc) self.chunks.extend(doc_chunks)
# Generate embeddings chunk_embeddings = self.embedder.generate_embeddings(doc_chunks) self.embeddings.extend(chunk_embeddings)
print(f"Indexed {len(self.chunks)} chunks from {len(documents)} documents")
# Example usageindexer = DocumentIndexer()documents = [ "Our return policy allows customers to return items within 45 days...", "Shipping is free for orders over $50. Standard shipping takes 3-5 days...", "We offer a price match guarantee. If you find a lower price..."]indexer.index_documents(documents)Step 2: Storing and Indexing
Section titled “Step 2: Storing and Indexing”Fully managed vector database for efficient similarity search:
class VectorStore: """ High-performance vector database for similarity search """ def __init__(self, dimension: int = 768): self.dimension = dimension self.vectors = [] self.metadata = [] self.index = None
def add_vectors(self, vectors: np.ndarray, metadata: List[dict]): """ Add vectors with metadata to the store """ self.vectors.extend(vectors) self.metadata.extend(metadata)
# Build specialized index for fast search self.build_index()
def build_index(self): """ Build optimized index for similarity search """ # In production, use libraries like FAISS, Pinecone, or Chroma self.index = self._create_efficient_index(self.vectors)
def search(self, query_vector: np.ndarray, top_k: int = 5) -> List[dict]: """ Find top-k most similar vectors """ # Calculate similarity scores similarities = self._compute_similarity(query_vector, self.vectors)
# Get top-k indices top_indices = np.argsort(similarities)[-top_k:][::-1]
# Return results with metadata results = [ { "content": self.metadata[idx]["content"], "score": similarities[idx], "metadata": self.metadata[idx] } for idx in top_indices ]
return results
def _compute_similarity(self, query: np.ndarray, vectors: List[np.ndarray]) -> np.ndarray: """ Compute cosine similarity """ vectors_array = np.array(vectors) similarities = np.dot(vectors_array, query) / ( np.linalg.norm(vectors_array, axis=1) * np.linalg.norm(query) ) return similarities
# Example usagevector_store = VectorStore()
# Store indexed chunksvector_store.add_vectors( vectors=indexer.embeddings, metadata=[{"content": chunk, "source": "policy_docs"} for chunk in indexer.chunks])Step 3: Retrieval and Reasoning
Section titled “Step 3: Retrieval and Reasoning”Query processing and context retrieval:
class RAGSystem: """ Complete RAG implementation """ def __init__(self, vector_store: VectorStore, llm_model): self.vector_store = vector_store self.llm = llm_model self.embedder = EmbeddingGenerator()
def retrieve(self, query: str, top_k: int = 3) -> List[dict]: """ Retrieve relevant documents for query """ # Convert query to embedding query_embedding = self.embedder.generate_embeddings([query])[0]
# Search vector store results = self.vector_store.search(query_embedding, top_k=top_k)
return results
def generate(self, query: str, context: List[dict]) -> str: """ Generate response using retrieved context """ # Format context context_text = "\n\n".join([ f"[Source {i+1}] {doc['content']}" for i, doc in enumerate(context) ])
# Build prompt prompt = f""" Answer the following question using ONLY the provided context. If the answer cannot be found in the context, say so.
Context: {context_text}
Question: {query}
Answer: """
# Generate response response = self.llm.generate(prompt)
return response
def query(self, user_question: str) -> dict: """ Complete RAG pipeline """ # Retrieve relevant documents relevant_docs = self.retrieve(user_question, top_k=3)
# Generate grounded response answer = self.generate(user_question, relevant_docs)
return { "answer": answer, "sources": relevant_docs, "query": user_question }
# Example usagerag = RAGSystem(vector_store, llm_model)
result = rag.query("What is your return policy?")print(f"Answer: {result['answer']}")print(f"\nSources used:")for i, source in enumerate(result['sources']): print(f"{i+1}. {source['content'][:100]}... (score: {source['score']:.2f})")⚡ Quick Win: Simple RAG in 30 Lines
Section titled “⚡ Quick Win: Simple RAG in 30 Lines”class SimpleRAG: """Minimal RAG implementation"""
def __init__(self, documents: List[str]): self.documents = documents self.embedder = EmbeddingModel() self.llm = LLM()
# Index documents self.doc_embeddings = self.embedder.embed(documents)
def answer(self, question: str) -> str: # Embed question q_embedding = self.embedder.embed([question])[0]
# Find most similar documents similarities = [ cosine_similarity(q_embedding, doc_emb) for doc_emb in self.doc_embeddings ] top_3_idx = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:3]
# Get relevant context context = "\n".join([self.documents[i] for i in top_3_idx])
# Generate answer prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:" return self.llm.generate(prompt)
# Usagedocs = ["Return policy: 45 days", "Shipping: Free over $50"]rag = SimpleRAG(docs)print(rag.answer("What's the return policy?"))🌳 Advanced: Hybrid Search
Section titled “🌳 Advanced: Hybrid Search”Combining Dense and Sparse Retrieval
Section titled “Combining Dense and Sparse Retrieval”class HybridRAG: """ RAG with both semantic and keyword search """ def __init__(self): self.dense_retriever = DenseRetriever() # Vector embeddings self.sparse_retriever = SparseRetriever() # BM25/TF-IDF
def hybrid_retrieve(self, query: str, top_k: int = 5) -> List[dict]: """ Combine dense and sparse retrieval """ # Get results from both retrievers dense_results = self.dense_retriever.search(query, top_k=top_k*2) sparse_results = self.sparse_retriever.search(query, top_k=top_k*2)
# Reciprocal rank fusion combined_scores = {} for rank, result in enumerate(dense_results): doc_id = result['id'] combined_scores[doc_id] = combined_scores.get(doc_id, 0) + 1/(rank + 60)
for rank, result in enumerate(sparse_results): doc_id = result['id'] combined_scores[doc_id] = combined_scores.get(doc_id, 0) + 1/(rank + 60)
# Return top-k by combined score top_docs = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
return [self.get_document(doc_id) for doc_id, _ in top_docs]🎯 Key Takeaways
Section titled “🎯 Key Takeaways”- Context window limits how much you can retrieve
🎓 Knowledge Check
Section titled “🎓 Knowledge Check”<InteractiveQuiz quizId=“rag-indexing” question=“In the RAG pipeline, which stage is responsible for converting documents into vector embeddings and storing them?” options={[“Retrieval”, “Generation”, “Indexing”, “Augmentation”]} correctAnswer={2} explanation=“Indexing is the foundational stage where documents are chunked and embedded into a vector space for later search.” />
Common Pitfalls
Section titled “Common Pitfalls”1. Chunks Too Large
Section titled “1. Chunks Too Large”# ❌ Bad: Loses granularitychunk_size = 5000
# ✅ Good: Balanced granularitychunk_size = 500overlap = 50 # Preserve context at boundaries2. Ignoring Metadata
Section titled “2. Ignoring Metadata”# ✅ Store useful metadatametadata = { "content": chunk, "source": "docs/policy.pdf", "page": 3, "timestamp": "2024-01-15", "category": "policy"}3. No Relevance Threshold
Section titled “3. No Relevance Threshold”# ✅ Filter low-quality matchesresults = vector_store.search(query, top_k=5)filtered = [r for r in results if r['score'] > 0.7]Next Steps
Section titled “Next Steps”Continue to Agentic RAG to learn how to make your RAG system more intelligent and autonomous.
💡 Pro Tip: Start with simple RAG, then optimize based on your specific retrieval quality needs.