Introduction to RAG-Based Document Q&A
Retrieval-Augmented Generation (RAG) has revolutionized how we interact with documents and knowledge bases. Instead of relying solely on a language model’s pre-trained knowledge, RAG systems retrieve relevant information from your own documents and use it to generate accurate, contextual answers.
In this comprehensive tutorial, you’ll build a complete Document Q&A system that can answer questions about any PDF, Word document, or text file you provide. This is the same technology powering enterprise knowledge management systems, customer support bots, and research assistants.
What You’ll Build
By the end of this tutorial, you’ll have a fully functional system that:
- Ingests PDF, DOCX, and TXT documents
- Splits documents into semantic chunks
- Creates vector embeddings for efficient retrieval
- Finds relevant context for any question
- Generates accurate answers using retrieved context
- Provides a web interface for easy interaction
Understanding RAG Architecture
Before we dive into code, let’s understand how RAG works:
The RAG Pipeline
1. Document Ingestion: Load documents from various formats (PDF, DOCX, TXT)
2. Text Chunking: Split documents into smaller, semantically meaningful chunks
3. Embedding Generation: Convert text chunks into vector embeddings
4. Vector Storage: Store embeddings in a vector database for fast retrieval
5. Query Processing: Convert user questions into embeddings
6. Retrieval: Find the most similar document chunks to the query
7. Generation: Use retrieved context to generate accurate answers
Why RAG Matters
Traditional language models have limitations:
- Knowledge cutoff dates mean they lack recent information
- They can’t access your private documents
- They sometimes hallucinate or make up facts
RAG solves these problems by grounding responses in your actual documents, providing source attribution, and enabling updates without retraining.
Prerequisites and Setup
Required Libraries
# Create virtual environment
python -m venv rag_env
source rag_env/bin/activate # On Windows: rag_env\Scripts\activate
# Install required packages
pip install langchain langchain-openai langchain-community
pip install chromadb sentence-transformers
pip install pypdf python-docx unstructured
pip install flask flask-cors
pip install openai tiktoken
pip install python-dotenvProject Structure
document_qa/
├── app.py # Flask web application
├── document_processor.py # Document loading and chunking
├── embeddings.py # Embedding generation
├── vector_store.py # ChromaDB operations
├── qa_chain.py # Question answering logic
├── config.py # Configuration settings
├── requirements.txt # Dependencies
├── templates/
│ └── index.html # Web interface
├── uploads/ # Uploaded documents
└── chroma_db/ # Vector database storageStep 1: Document Processing
First, let’s create a robust document processor that handles multiple file formats:
# document_processor.py
import os
from typing import List, Optional
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import (
PyPDFLoader,
Docx2txtLoader,
TextLoader,
UnstructuredFileLoader
)
from langchain.schema import Document
class DocumentProcessor:
"""Handles document loading and text chunking."""
def __init__(
self,
chunk_size: int = 1000,
chunk_overlap: int = 200,
separators: Optional[List[str]] = None
):
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.separators = separators or ["\n\n", "\n", ".", "!", "?", ",", " ", ""]
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=self.chunk_size,
chunk_overlap=self.chunk_overlap,
separators=self.separators,
length_function=len
)
# Map file extensions to loaders
self.loader_map = {
".pdf": PyPDFLoader,
".docx": Docx2txtLoader,
".doc": Docx2txtLoader,
".txt": TextLoader,
}
def load_document(self, file_path: str) -> List[Document]:
"""Load a document from file path."""
_, ext = os.path.splitext(file_path.lower())
if ext not in self.loader_map:
# Try unstructured loader as fallback
loader = UnstructuredFileLoader(file_path)
else:
loader = self.loader_map[ext](file_path)
try:
documents = loader.load()
# Add source metadata
for doc in documents:
doc.metadata["source"] = os.path.basename(file_path)
return documents
except Exception as e:
raise ValueError(f"Error loading {file_path}: {str(e)}")
def split_documents(self, documents: List[Document]) -> List[Document]:
"""Split documents into smaller chunks."""
chunks = self.text_splitter.split_documents(documents)
# Add chunk indices to metadata
for i, chunk in enumerate(chunks):
chunk.metadata["chunk_index"] = i
return chunks
def process_file(self, file_path: str) -> List[Document]:
"""Load and split a single file."""
documents = self.load_document(file_path)
return self.split_documents(documents)
def process_directory(self, directory_path: str) -> List[Document]:
"""Process all supported files in a directory."""
all_chunks = []
for filename in os.listdir(directory_path):
file_path = os.path.join(directory_path, filename)
if os.path.isfile(file_path):
_, ext = os.path.splitext(filename.lower())
if ext in self.loader_map:
try:
chunks = self.process_file(file_path)
all_chunks.extend(chunks)
print(f"Processed: {filename} ({len(chunks)} chunks)")
except Exception as e:
print(f"Error processing {filename}: {e}")
return all_chunksStep 2: Embedding Generation
Now let’s create the embedding system. We’ll support both OpenAI embeddings and free local alternatives:
# embeddings.py
from typing import List, Optional
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
import os
class EmbeddingManager:
"""Manages embedding generation with multiple providers."""
def __init__(
self,
provider: str = "openai",
model_name: Optional[str] = None
):
self.provider = provider
if provider == "openai":
self.model_name = model_name or "text-embedding-3-small"
self.embeddings = OpenAIEmbeddings(
model=self.model_name,
openai_api_key=os.getenv("OPENAI_API_KEY")
)
elif provider == "huggingface":
# Free local embeddings - no API key needed
self.model_name = model_name or "sentence-transformers/all-MiniLM-L6-v2"
self.embeddings = HuggingFaceEmbeddings(
model_name=self.model_name,
model_kwargs={"device": "cpu"},
encode_kwargs={"normalize_embeddings": True}
)
else:
raise ValueError(f"Unknown provider: {provider}")
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for a list of texts."""
return self.embeddings.embed_documents(texts)
def embed_query(self, text: str) -> List[float]:
"""Generate embedding for a single query."""
return self.embeddings.embed_query(text)
def get_embeddings(self):
"""Return the underlying embeddings object for use with vector stores."""
return self.embeddingsStep 3: Vector Store with ChromaDB
ChromaDB provides efficient vector storage and retrieval:
# vector_store.py
import os
from typing import List, Optional, Dict, Any
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from embeddings import EmbeddingManager
class VectorStore:
"""Manages vector storage and retrieval using ChromaDB."""
def __init__(
self,
persist_directory: str = "./chroma_db",
collection_name: str = "documents",
embedding_provider: str = "huggingface"
):
self.persist_directory = persist_directory
self.collection_name = collection_name
# Initialize embedding manager
self.embedding_manager = EmbeddingManager(provider=embedding_provider)
# Initialize or load existing vector store
self.vector_store = Chroma(
collection_name=self.collection_name,
embedding_function=self.embedding_manager.get_embeddings(),
persist_directory=self.persist_directory
)
def add_documents(self, documents: List[Document]) -> List[str]:
"""Add documents to the vector store."""
ids = self.vector_store.add_documents(documents)
return ids
def similarity_search(
self,
query: str,
k: int = 4,
filter: Optional[Dict[str, Any]] = None
) -> List[Document]:
"""Search for similar documents."""
return self.vector_store.similarity_search(
query=query,
k=k,
filter=filter
)
def similarity_search_with_score(
self,
query: str,
k: int = 4
) -> List[tuple]:
"""Search with relevance scores."""
return self.vector_store.similarity_search_with_score(
query=query,
k=k
)
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5
) -> List[Document]:
"""MMR search for diverse results."""
return self.vector_store.max_marginal_relevance_search(
query=query,
k=k,
fetch_k=fetch_k,
lambda_mult=lambda_mult
)
def delete_collection(self):
"""Delete the entire collection."""
self.vector_store.delete_collection()
def get_collection_stats(self) -> Dict[str, Any]:
"""Get statistics about the collection."""
collection = self.vector_store._collection
return {
"name": self.collection_name,
"count": collection.count(),
"persist_directory": self.persist_directory
}Step 4: Question Answering Chain
Now let’s build the QA system that ties everything together:
# qa_chain.py
import os
from typing import List, Dict, Any, Optional
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import Document
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from vector_store import VectorStore
class DocumentQA:
"""Question answering system using RAG."""
def __init__(
self,
vector_store: VectorStore,
model_name: str = "gpt-3.5-turbo",
temperature: float = 0.0,
max_tokens: int = 1000
):
self.vector_store = vector_store
# Initialize LLM
self.llm = ChatOpenAI(
model_name=model_name,
temperature=temperature,
max_tokens=max_tokens,
openai_api_key=os.getenv("OPENAI_API_KEY")
)
# Conversation memory
self.memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True,
output_key="answer"
)
# QA prompt template
self.qa_prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant that answers questions based on the provided context.
Instructions:
- Answer the question using ONLY the information from the context provided
- If the answer is not in the context, say "I could not find this information in the documents"
- Be concise but comprehensive
- If relevant, mention which document the information comes from
- Do not make up information that is not in the context"""),
("human", """Context from documents:
{context}
Question: {question}
Please provide a helpful answer based on the context above.""")
])
def format_context(self, documents: List[Document]) -> str:
"""Format retrieved documents into context string."""
context_parts = []
for i, doc in enumerate(documents, 1):
source = doc.metadata.get("source", "Unknown")
chunk_idx = doc.metadata.get("chunk_index", "N/A")
context_parts.append(
f"[Document {i} - {source} (chunk {chunk_idx})]:\n{doc.page_content}"
)
return "\n\n---\n\n".join(context_parts)
def answer_question(
self,
question: str,
k: int = 4,
use_mmr: bool = True
) -> Dict[str, Any]:
"""Answer a question using RAG."""
# Retrieve relevant documents
if use_mmr:
documents = self.vector_store.max_marginal_relevance_search(
query=question,
k=k,
fetch_k=k * 3
)
else:
documents = self.vector_store.similarity_search(
query=question,
k=k
)
if not documents:
return {
"answer": "No relevant documents found to answer your question.",
"sources": [],
"context": ""
}
# Format context
context = self.format_context(documents)
# Generate answer
messages = self.qa_prompt.format_messages(
context=context,
question=question
)
response = self.llm.invoke(messages)
# Extract source information
sources = list(set([
doc.metadata.get("source", "Unknown")
for doc in documents
]))
return {
"answer": response.content,
"sources": sources,
"context": context,
"num_chunks_retrieved": len(documents)
}
def get_conversational_chain(self):
"""Get a conversational chain for multi-turn conversations."""
return ConversationalRetrievalChain.from_llm(
llm=self.llm,
retriever=self.vector_store.vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 12}
),
memory=self.memory,
return_source_documents=True,
verbose=False
)
def chat(self, question: str) -> Dict[str, Any]:
"""Handle conversational Q&A with memory."""
chain = self.get_conversational_chain()
result = chain({"question": question})
return {
"answer": result["answer"],
"sources": [
doc.metadata.get("source", "Unknown")
for doc in result.get("source_documents", [])
]
}
def clear_memory(self):
"""Clear conversation history."""
self.memory.clear()Step 5: Flask Web Application
Let’s create a user-friendly web interface:
# app.py
import os
from flask import Flask, render_template, request, jsonify
from werkzeug.utils import secure_filename
from dotenv import load_dotenv
from document_processor import DocumentProcessor
from vector_store import VectorStore
from qa_chain import DocumentQA
# Load environment variables
load_dotenv()
app = Flask(__name__)
app.config["UPLOAD_FOLDER"] = "uploads"
app.config["MAX_CONTENT_LENGTH"] = 50 * 1024 * 1024 # 50MB max
# Ensure directories exist
os.makedirs(app.config["UPLOAD_FOLDER"], exist_ok=True)
os.makedirs("chroma_db", exist_ok=True)
# Initialize components
document_processor = DocumentProcessor(chunk_size=1000, chunk_overlap=200)
vector_store = VectorStore(
persist_directory="./chroma_db",
embedding_provider="huggingface" # Free, no API key needed
)
qa_system = DocumentQA(vector_store=vector_store)
ALLOWED_EXTENSIONS = {"pdf", "docx", "doc", "txt"}
def allowed_file(filename):
return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS
@app.route("/")
def index():
stats = vector_store.get_collection_stats()
return render_template("index.html", doc_count=stats["count"])
@app.route("/upload", methods=["POST"])
def upload_document():
if "file" not in request.files:
return jsonify({"error": "No file provided"}), 400
file = request.files["file"]
if file.filename == "":
return jsonify({"error": "No file selected"}), 400
if not allowed_file(file.filename):
return jsonify({"error": "File type not supported"}), 400
try:
# Save file
filename = secure_filename(file.filename)
file_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
file.save(file_path)
# Process document
chunks = document_processor.process_file(file_path)
# Add to vector store
vector_store.add_documents(chunks)
return jsonify({
"success": True,
"message": f"Successfully processed {filename}",
"chunks_created": len(chunks)
})
except Exception as e:
return jsonify({"error": str(e)}), 500
@app.route("/ask", methods=["POST"])
def ask_question():
data = request.get_json()
question = data.get("question", "").strip()
if not question:
return jsonify({"error": "No question provided"}), 400
try:
result = qa_system.answer_question(question, k=4)
return jsonify(result)
except Exception as e:
return jsonify({"error": str(e)}), 500
@app.route("/chat", methods=["POST"])
def chat():
data = request.get_json()
question = data.get("question", "").strip()
if not question:
return jsonify({"error": "No question provided"}), 400
try:
result = qa_system.chat(question)
return jsonify(result)
except Exception as e:
return jsonify({"error": str(e)}), 500
@app.route("/clear_chat", methods=["POST"])
def clear_chat():
qa_system.clear_memory()
return jsonify({"success": True})
@app.route("/stats")
def get_stats():
return jsonify(vector_store.get_collection_stats())
if __name__ == "__main__":
app.run(debug=True, port=5000)Step 6: Web Interface
Create a clean, functional interface:
<!-- templates/index.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document Q&A System</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
background: #f5f5f5;
min-height: 100vh;
padding: 20px;
}
.container { max-width: 900px; margin: 0 auto; }
h1 { text-align: center; color: #333; margin-bottom: 30px; }
.card {
background: white;
border-radius: 12px;
padding: 25px;
margin-bottom: 20px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
}
.upload-zone {
border: 2px dashed #ddd;
border-radius: 8px;
padding: 40px;
text-align: center;
cursor: pointer;
transition: border-color 0.3s;
}
.upload-zone:hover { border-color: #007bff; }
.upload-zone.dragover { border-color: #007bff; background: #f8f9ff; }
#file-input { display: none; }
.stats {
display: flex;
gap: 20px;
margin-bottom: 20px;
flex-wrap: wrap;
}
.stat-item {
background: #e3f2fd;
padding: 15px 25px;
border-radius: 8px;
text-align: center;
}
.stat-value { font-size: 24px; font-weight: bold; color: #1976d2; }
.stat-label { font-size: 12px; color: #666; }
.chat-container {
height: 400px;
overflow-y: auto;
border: 1px solid #eee;
border-radius: 8px;
padding: 15px;
margin-bottom: 15px;
background: #fafafa;
}
.message {
margin-bottom: 15px;
padding: 12px 16px;
border-radius: 12px;
max-width: 85%;
}
.user-message {
background: #007bff;
color: white;
margin-left: auto;
}
.assistant-message {
background: white;
border: 1px solid #eee;
}
.sources {
font-size: 12px;
color: #666;
margin-top: 8px;
padding-top: 8px;
border-top: 1px solid #eee;
}
.input-group {
display: flex;
gap: 10px;
}
input[type="text"] {
flex: 1;
padding: 12px 16px;
border: 1px solid #ddd;
border-radius: 8px;
font-size: 16px;
}
button {
padding: 12px 24px;
background: #007bff;
color: white;
border: none;
border-radius: 8px;
cursor: pointer;
font-size: 16px;
transition: background 0.3s;
}
button:hover { background: #0056b3; }
button:disabled { background: #ccc; cursor: not-allowed; }
.loading { opacity: 0.6; pointer-events: none; }
</style>
</head>
<body>
<div class="container">
<h1>📚 Document Q&A System</h1>
<div class="card">
<h3>Upload Documents</h3>
<div class="upload-zone" id="upload-zone">
<p>📁 Drag and drop files here or click to browse</p>
<p style="font-size: 12px; color: #666; margin-top: 10px;">
Supported: PDF, DOCX, DOC, TXT (Max 50MB)
</p>
<input type="file" id="file-input" accept=".pdf,.docx,.doc,.txt">
</div>
<div id="upload-status" style="margin-top: 15px;"></div>
</div>
<div class="stats">
<div class="stat-item">
<div class="stat-value" id="doc-count">{{ doc_count }}</div>
<div class="stat-label">Document Chunks</div>
</div>
</div>
<div class="card">
<h3>Ask Questions</h3>
<div class="chat-container" id="chat-container">
<div class="message assistant-message">
Hello! Upload some documents and ask me questions about them.
</div>
</div>
<div class="input-group">
<input type="text" id="question-input" placeholder="Ask a question about your documents...">
<button id="ask-btn">Ask</button>
<button id="clear-btn" style="background: #6c757d;">Clear</button>
</div>
</div>
</div>
<script>
const uploadZone = document.getElementById("upload-zone");
const fileInput = document.getElementById("file-input");
const uploadStatus = document.getElementById("upload-status");
const chatContainer = document.getElementById("chat-container");
const questionInput = document.getElementById("question-input");
const askBtn = document.getElementById("ask-btn");
const clearBtn = document.getElementById("clear-btn");
// Upload handling
uploadZone.addEventListener("click", () => fileInput.click());
uploadZone.addEventListener("dragover", (e) => {
e.preventDefault();
uploadZone.classList.add("dragover");
});
uploadZone.addEventListener("dragleave", () => {
uploadZone.classList.remove("dragover");
});
uploadZone.addEventListener("drop", (e) => {
e.preventDefault();
uploadZone.classList.remove("dragover");
handleFile(e.dataTransfer.files[0]);
});
fileInput.addEventListener("change", () => handleFile(fileInput.files[0]));
async function handleFile(file) {
if (!file) return;
const formData = new FormData();
formData.append("file", file);
uploadStatus.innerHTML = "⏳ Processing " + file.name + "...";
try {
const response = await fetch("/upload", {
method: "POST",
body: formData
});
const data = await response.json();
if (data.success) {
uploadStatus.innerHTML = "✅ " + data.message + " (" + data.chunks_created + " chunks)";
updateStats();
} else {
uploadStatus.innerHTML = "❌ Error: " + data.error;
}
} catch (error) {
uploadStatus.innerHTML = "❌ Upload failed: " + error.message;
}
}
async function updateStats() {
const response = await fetch("/stats");
const data = await response.json();
document.getElementById("doc-count").textContent = data.count;
}
// Chat handling
askBtn.addEventListener("click", askQuestion);
questionInput.addEventListener("keypress", (e) => {
if (e.key === "Enter") askQuestion();
});
clearBtn.addEventListener("click", clearChat);
async function askQuestion() {
const question = questionInput.value.trim();
if (!question) return;
addMessage(question, "user");
questionInput.value = "";
askBtn.disabled = true;
try {
const response = await fetch("/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ question })
});
const data = await response.json();
if (data.error) {
addMessage("Error: " + data.error, "assistant");
} else {
addMessage(data.answer, "assistant", data.sources);
}
} catch (error) {
addMessage("Error: " + error.message, "assistant");
}
askBtn.disabled = false;
}
function addMessage(text, type, sources = []) {
const div = document.createElement("div");
div.className = "message " + type + "-message";
div.textContent = text;
if (sources && sources.length > 0) {
const sourcesDiv = document.createElement("div");
sourcesDiv.className = "sources";
sourcesDiv.textContent = "📄 Sources: " + sources.join(", ");
div.appendChild(sourcesDiv);
}
chatContainer.appendChild(div);
chatContainer.scrollTop = chatContainer.scrollHeight;
}
async function clearChat() {
await fetch("/clear_chat", { method: "POST" });
chatContainer.innerHTML = "<div class=\"message assistant-message\">Chat cleared. Ask me anything about your documents!</div>";
}
</script>
</body>
</html>Advanced Features
Hybrid Search
Combine keyword and semantic search for better results:
# Add to vector_store.py
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
def get_hybrid_retriever(self, documents: List[Document], weights: List[float] = [0.5, 0.5]):
"""Create a hybrid retriever combining BM25 and semantic search."""
# BM25 for keyword matching
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 4
# Semantic retriever
semantic_retriever = self.vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 4}
)
# Combine both
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, semantic_retriever],
weights=weights
)
return ensemble_retrieverQuery Expansion
Generate multiple query variations for better retrieval:
# Add to qa_chain.py
def expand_query(self, question: str) -> List[str]:
"""Generate multiple query variations."""
expansion_prompt = """Generate 3 alternative phrasings of this question that might help find relevant information.
Original question: {question}
Return only the 3 alternative questions, one per line."""
response = self.llm.invoke(expansion_prompt.format(question=question))
queries = [question] # Include original
queries.extend(response.content.strip().split("\n"))
return queries[:4] # Limit to 4 total queriesDeployment Considerations
Production Setup
# config.py
import os
from dataclasses import dataclass
@dataclass
class Config:
# API Keys
OPENAI_API_KEY: str = os.getenv("OPENAI_API_KEY", "")
# Vector Store
CHROMA_PERSIST_DIR: str = os.getenv("CHROMA_PERSIST_DIR", "./chroma_db")
COLLECTION_NAME: str = "production_docs"
# Document Processing
CHUNK_SIZE: int = 1000
CHUNK_OVERLAP: int = 200
# LLM Settings
MODEL_NAME: str = "gpt-3.5-turbo"
TEMPERATURE: float = 0.0
MAX_TOKENS: int = 1000
# Search Settings
RETRIEVAL_K: int = 4
USE_MMR: bool = TrueTesting Your System
# test_qa.py
from document_processor import DocumentProcessor
from vector_store import VectorStore
from qa_chain import DocumentQA
# Initialize
processor = DocumentProcessor()
store = VectorStore()
qa = DocumentQA(vector_store=store)
# Add test document
test_content = """
Artificial Intelligence (AI) is transforming industries worldwide.
Machine Learning enables computers to learn from data.
Deep Learning uses neural networks with multiple layers.
Natural Language Processing helps computers understand human language.
"""
# Process and add
with open("test_doc.txt", "w") as f:
f.write(test_content)
chunks = processor.process_file("test_doc.txt")
store.add_documents(chunks)
# Test questions
questions = [
"What is AI?",
"How does Machine Learning work?",
"What is Deep Learning?"
]
for q in questions:
result = qa.answer_question(q)
print(f"Q: {q}")
print(f"A: {result['answer']}\n")Conclusion
You’ve built a complete RAG-based Document Q&A system that can ingest documents, store them efficiently, and answer questions accurately. This system demonstrates production-ready patterns including proper chunking strategies, hybrid search capabilities, conversational memory, and a clean web interface.
Key takeaways:
- RAG grounds LLM responses in your actual documents
- Proper chunking is crucial for retrieval quality
- MMR search provides diverse, relevant results
- Local embeddings eliminate API costs for small projects
- Conversation memory enables multi-turn interactions
This foundation can be extended with features like document comparison, multi-language support, citation highlighting, and integration with enterprise document management systems.
