๐ RAG Fundamentals: Retrieval-Augmented Generation
import ContributionButtons from โ../../../components/ContributionButtons.astroโ; import UsageTracker from โ../../../components/UsageTracker.astroโ; import AuthorshipBadge from โ../../../components/AuthorshipBadge.astroโ; import GreaterGoodBadge from โ../../../components/GreaterGoodBadge.astroโ; import CookbookAsCode from โ../../../components/CookbookAsCode.astroโ;
๐ฑ Seedling Concept
Section titled โ๐ฑ Seedling ConceptโLabel: Grounding AI in Real Data
RAG (Retrieval-Augmented Generation) transforms AI agents from creative storytellers into accurate information providers by grounding their responses in retrieved, factual data.
What is RAG?
Section titled โWhat is RAG?โThe Core Idea
Section titled โThe Core IdeaโInstead of relying solely on the modelโs training data, RAG:
- Retrieves relevant information from a knowledge base
- Augments the modelโs prompt with retrieved context
- Generates a response grounded in actual data
User Query: "What's our return policy?" โ โผ [Similarity Search] โ โผ [Retrieve Policy Documents] โ โผ [Augment Prompt with Context] โ โผ [Generate Grounded Response]๐ก Why RAG Matters
Section titled โ๐ก Why RAG MattersโWithout RAG
Section titled โWithout RAGโ# Model might hallucinate or provide outdated informationquery = "What's our return policy?"response = model.generate(query)# Response: "I believe you have 30 days..." (could be wrong)With RAG
Section titled โWith RAGโ# Model grounds response in actual documentsquery = "What's our return policy?"relevant_docs = retrieve(query, knowledge_base)context = f"Policy documents: {relevant_docs}"response = model.generate(query, context=context)# Response: "According to your current policy, customers have 45 days..."๐ฟ RAG Architecture
Section titled โ๐ฟ RAG ArchitectureโThree-Stage Pipeline
Section titled โThree-Stage Pipelineโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Stage 1: Indexing โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ Document โ Chunks โ Embeddings โ Vector Store โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Stage 2: Retrieval โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ Query โ Embedding โ Similarity Search โ Top-K โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Stage 3: Generation โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ Context + Query โ LLM โ Grounded Response โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ๐ฌ Deep Dive: Building a RAG System
Section titled โ๐ฌ Deep Dive: Building a RAG SystemโStep 1: Generating Embeddings and Indexing
Section titled โStep 1: Generating Embeddings and IndexingโConverting data into vector embeddings for semantic search:
from typing import Listimport numpy as np
class EmbeddingGenerator: """ Converts text into vector embeddings """ def __init__(self, model_name="text-embedding-004"): self.embedding_model = load_embedding_model(model_name)
def generate_embeddings(self, texts: List[str]) -> np.ndarray: """ Generate embeddings for list of texts """ embeddings = self.embedding_model.embed(texts) return embeddings
class DocumentIndexer: """ Indexes documents for retrieval """ def __init__(self): self.embedder = EmbeddingGenerator() self.chunks = [] self.embeddings = []
def chunk_document(self, document: str, chunk_size: int = 500) -> List[str]: """ Split document into manageable chunks """ words = document.split() chunks = []
for i in range(0, len(words), chunk_size): chunk = ' '.join(words[i:i + chunk_size]) chunks.append(chunk)
return chunks
def index_documents(self, documents: List[str]): """ Process and index documents """ for doc in documents: # Split into chunks doc_chunks = self.chunk_document(doc) self.chunks.extend(doc_chunks)
# Generate embeddings chunk_embeddings = self.embedder.generate_embeddings(doc_chunks) self.embeddings.extend(chunk_embeddings)
print(f"Indexed {len(self.chunks)} chunks from {len(documents)} documents")
# Example usageindexer = DocumentIndexer()documents = [ "Our return policy allows customers to return items within 45 days...", "Shipping is free for orders over $50. Standard shipping takes 3-5 days...", "We offer a price match guarantee. If you find a lower price..."]indexer.index_documents(documents)Step 2: Storing and Indexing
Section titled โStep 2: Storing and IndexingโFully managed vector database for efficient similarity search:
class VectorStore: """ High-performance vector database for similarity search """ def __init__(self, dimension: int = 768): self.dimension = dimension self.vectors = [] self.metadata = [] self.index = None
def add_vectors(self, vectors: np.ndarray, metadata: List[dict]): """ Add vectors with metadata to the store """ self.vectors.extend(vectors) self.metadata.extend(metadata)
# Build specialized index for fast search self.build_index()
def build_index(self): """ Build optimized index for similarity search """ # In production, use libraries like FAISS, Pinecone, or Chroma self.index = self._create_efficient_index(self.vectors)
def search(self, query_vector: np.ndarray, top_k: int = 5) -> List[dict]: """ Find top-k most similar vectors """ # Calculate similarity scores similarities = self._compute_similarity(query_vector, self.vectors)
# Get top-k indices top_indices = np.argsort(similarities)[-top_k:][::-1]
# Return results with metadata results = [ { "content": self.metadata[idx]["content"], "score": similarities[idx], "metadata": self.metadata[idx] } for idx in top_indices ]
return results
def _compute_similarity(self, query: np.ndarray, vectors: List[np.ndarray]) -> np.ndarray: """ Compute cosine similarity """ vectors_array = np.array(vectors) similarities = np.dot(vectors_array, query) / ( np.linalg.norm(vectors_array, axis=1) * np.linalg.norm(query) ) return similarities
# Example usagevector_store = VectorStore()
# Store indexed chunksvector_store.add_vectors( vectors=indexer.embeddings, metadata=[{"content": chunk, "source": "policy_docs"} for chunk in indexer.chunks])Step 3: Retrieval and Reasoning
Section titled โStep 3: Retrieval and ReasoningโQuery processing and context retrieval:
class RAGSystem: """ Complete RAG implementation """ def __init__(self, vector_store: VectorStore, llm_model): self.vector_store = vector_store self.llm = llm_model self.embedder = EmbeddingGenerator()
def retrieve(self, query: str, top_k: int = 3) -> List[dict]: """ Retrieve relevant documents for query """ # Convert query to embedding query_embedding = self.embedder.generate_embeddings([query])[0]
# Search vector store results = self.vector_store.search(query_embedding, top_k=top_k)
return results
def generate(self, query: str, context: List[dict]) -> str: """ Generate response using retrieved context """ # Format context context_text = "\n\n".join([ f"[Source {i+1}] {doc['content']}" for i, doc in enumerate(context) ])
# Build prompt prompt = f""" Answer the following question using ONLY the provided context. If the answer cannot be found in the context, say so.
Context: {context_text}
Question: {query}
Answer: """
# Generate response response = self.llm.generate(prompt)
return response
def query(self, user_question: str) -> dict: """ Complete RAG pipeline """ # Retrieve relevant documents relevant_docs = self.retrieve(user_question, top_k=3)
# Generate grounded response answer = self.generate(user_question, relevant_docs)
return { "answer": answer, "sources": relevant_docs, "query": user_question }
# Example usagerag = RAGSystem(vector_store, llm_model)
result = rag.query("What is your return policy?")print(f"Answer: {result['answer']}")print(f"\nSources used:")for i, source in enumerate(result['sources']): print(f"{i+1}. {source['content'][:100]}... (score: {source['score']:.2f})")โก Quick Win: Simple RAG in 30 Lines
Section titled โโก Quick Win: Simple RAG in 30 Linesโclass SimpleRAG: """Minimal RAG implementation"""
def __init__(self, documents: List[str]): self.documents = documents self.embedder = EmbeddingModel() self.llm = LLM()
# Index documents self.doc_embeddings = self.embedder.embed(documents)
def answer(self, question: str) -> str: # Embed question q_embedding = self.embedder.embed([question])[0]
# Find most similar documents similarities = [ cosine_similarity(q_embedding, doc_emb) for doc_emb in self.doc_embeddings ] top_3_idx = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:3]
# Get relevant context context = "\n".join([self.documents[i] for i in top_3_idx])
# Generate answer prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:" return self.llm.generate(prompt)
# Usagedocs = ["Return policy: 45 days", "Shipping: Free over $50"]rag = SimpleRAG(docs)print(rag.answer("What's the return policy?"))๐ณ Advanced: Hybrid Search
Section titled โ๐ณ Advanced: Hybrid SearchโCombining Dense and Sparse Retrieval
Section titled โCombining Dense and Sparse Retrievalโclass HybridRAG: """ RAG with both semantic and keyword search """ def __init__(self): self.dense_retriever = DenseRetriever() # Vector embeddings self.sparse_retriever = SparseRetriever() # BM25/TF-IDF
def hybrid_retrieve(self, query: str, top_k: int = 5) -> List[dict]: """ Combine dense and sparse retrieval """ # Get results from both retrievers dense_results = self.dense_retriever.search(query, top_k=top_k*2) sparse_results = self.sparse_retriever.search(query, top_k=top_k*2)
# Reciprocal rank fusion combined_scores = {} for rank, result in enumerate(dense_results): doc_id = result['id'] combined_scores[doc_id] = combined_scores.get(doc_id, 0) + 1/(rank + 60)
for rank, result in enumerate(sparse_results): doc_id = result['id'] combined_scores[doc_id] = combined_scores.get(doc_id, 0) + 1/(rank + 60)
# Return top-k by combined score top_docs = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
return [self.get_document(doc_id) for doc_id, _ in top_docs]๐ฏ Key Takeaways
Section titled โ๐ฏ Key Takeawaysโ- RAG grounds LLM responses in factual, retrieved data
- Three stages: Indexing, Retrieval, Generation
- Vector embeddings enable semantic similarity search
- Chunking strategy affects retrieval quality
- Context window limits how much you can retrieve
Common Pitfalls
Section titled โCommon Pitfallsโ1. Chunks Too Large
Section titled โ1. Chunks Too Largeโ# โ Bad: Loses granularitychunk_size = 5000
# โ
Good: Balanced granularitychunk_size = 500overlap = 50 # Preserve context at boundaries2. Ignoring Metadata
Section titled โ2. Ignoring Metadataโ# โ
Store useful metadatametadata = { "content": chunk, "source": "docs/policy.pdf", "page": 3, "timestamp": "2024-01-15", "category": "policy"}3. No Relevance Threshold
Section titled โ3. No Relevance Thresholdโ# โ
Filter low-quality matchesresults = vector_store.search(query, top_k=5)filtered = [r for r in results if r['score'] > 0.7]Next Steps
Section titled โNext StepsโContinue to Agentic RAG to learn how to make your RAG system more intelligent and autonomous.
๐ก Pro Tip: Start with simple RAG, then optimize based on your specific retrieval quality needs.