Build a Document Q&A System with RAG: Complete Python Tutorial

Introduction to RAG-Based Document Q&A

Retrieval-Augmented Generation (RAG) has revolutionized how we interact with documents and knowledge bases. Instead of relying solely on a language model’s pre-trained knowledge, RAG systems retrieve relevant information from your own documents and use it to generate accurate, contextual answers.

In this comprehensive tutorial, you’ll build a complete Document Q&A system that can answer questions about any PDF, Word document, or text file you provide. This is the same technology powering enterprise knowledge management systems, customer support bots, and research assistants.

What You’ll Build

By the end of this tutorial, you’ll have a fully functional system that:

Ingests PDF, DOCX, and TXT documents
Splits documents into semantic chunks
Creates vector embeddings for efficient retrieval
Finds relevant context for any question
Generates accurate answers using retrieved context
Provides a web interface for easy interaction

Understanding RAG Architecture

Before we dive into code, let’s understand how RAG works:

The RAG Pipeline

1. Document Ingestion: Load documents from various formats (PDF, DOCX, TXT)

2. Text Chunking: Split documents into smaller, semantically meaningful chunks

3. Embedding Generation: Convert text chunks into vector embeddings

4. Vector Storage: Store embeddings in a vector database for fast retrieval

5. Query Processing: Convert user questions into embeddings

6. Retrieval: Find the most similar document chunks to the query

7. Generation: Use retrieved context to generate accurate answers

Why RAG Matters

Traditional language models have limitations:

Knowledge cutoff dates mean they lack recent information
They can’t access your private documents
They sometimes hallucinate or make up facts

RAG solves these problems by grounding responses in your actual documents, providing source attribution, and enabling updates without retraining.

Prerequisites and Setup

Required Libraries

# Create virtual environment
python -m venv rag_env
source rag_env/bin/activate  # On Windows: rag_envScriptsactivate

# Install required packages
pip install langchain langchain-openai langchain-community
pip install chromadb sentence-transformers
pip install pypdf python-docx unstructured
pip install flask flask-cors
pip install openai tiktoken
pip install python-dotenv

Project Structure

document_qa/
├── app.py                 # Flask web application
├── document_processor.py  # Document loading and chunking
├── embeddings.py          # Embedding generation
├── vector_store.py        # ChromaDB operations
├── qa_chain.py            # Question answering logic
├── config.py              # Configuration settings
├── requirements.txt       # Dependencies
├── templates/
│   └── index.html         # Web interface
├── uploads/               # Uploaded documents
└── chroma_db/             # Vector database storage

Step 1: Document Processing

First, let’s create a robust document processor that handles multiple file formats:

# document_processor.py
import os
from typing import List, Optional
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import (
    PyPDFLoader,
    Docx2txtLoader,
    TextLoader,
    UnstructuredFileLoader
)
from langchain.schema import Document

class DocumentProcessor:
    """Handles document loading and text chunking."""
    
    def __init__(
        self,
        chunk_size: int = 1000,
        chunk_overlap: int = 200,
        separators: Optional[List[str]] = None
    ):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.separators = separators or ["nn", "n", ".", "!", "?", ",", " ", ""]
        
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=self.chunk_size,
            chunk_overlap=self.chunk_overlap,
            separators=self.separators,
            length_function=len
        )
        
        # Map file extensions to loaders
        self.loader_map = {
            ".pdf": PyPDFLoader,
            ".docx": Docx2txtLoader,
            ".doc": Docx2txtLoader,
            ".txt": TextLoader,
        }
    
    def load_document(self, file_path: str) -> List[Document]:
        """Load a document from file path."""
        _, ext = os.path.splitext(file_path.lower())
        
        if ext not in self.loader_map:
            # Try unstructured loader as fallback
            loader = UnstructuredFileLoader(file_path)
        else:
            loader = self.loader_map[ext](file_path)
        
        try:
            documents = loader.load()
            # Add source metadata
            for doc in documents:
                doc.metadata["source"] = os.path.basename(file_path)
            return documents
        except Exception as e:
            raise ValueError(f"Error loading {file_path}: {str(e)}")
    
    def split_documents(self, documents: List[Document]) -> List[Document]:
        """Split documents into smaller chunks."""
        chunks = self.text_splitter.split_documents(documents)
        
        # Add chunk indices to metadata
        for i, chunk in enumerate(chunks):
            chunk.metadata["chunk_index"] = i
        
        return chunks
    
    def process_file(self, file_path: str) -> List[Document]:
        """Load and split a single file."""
        documents = self.load_document(file_path)
        return self.split_documents(documents)
    
    def process_directory(self, directory_path: str) -> List[Document]:
        """Process all supported files in a directory."""
        all_chunks = []
        
        for filename in os.listdir(directory_path):
            file_path = os.path.join(directory_path, filename)
            
            if os.path.isfile(file_path):
                _, ext = os.path.splitext(filename.lower())
                if ext in self.loader_map:
                    try:
                        chunks = self.process_file(file_path)
                        all_chunks.extend(chunks)
                        print(f"Processed: {filename} ({len(chunks)} chunks)")
                    except Exception as e:
                        print(f"Error processing {filename}: {e}")
        
        return all_chunks

Step 2: Embedding Generation

Now let’s create the embedding system. We’ll support both OpenAI embeddings and free local alternatives:

# embeddings.py
from typing import List, Optional
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
import os

class EmbeddingManager:
    """Manages embedding generation with multiple providers."""
    
    def __init__(
        self,
        provider: str = "openai",
        model_name: Optional[str] = None
    ):
        self.provider = provider
        
        if provider == "openai":
            self.model_name = model_name or "text-embedding-3-small"
            self.embeddings = OpenAIEmbeddings(
                model=self.model_name,
                openai_api_key=os.getenv("OPENAI_API_KEY")
            )
        elif provider == "huggingface":
            # Free local embeddings - no API key needed
            self.model_name = model_name or "sentence-transformers/all-MiniLM-L6-v2"
            self.embeddings = HuggingFaceEmbeddings(
                model_name=self.model_name,
                model_kwargs={"device": "cpu"},
                encode_kwargs={"normalize_embeddings": True}
            )
        else:
            raise ValueError(f"Unknown provider: {provider}")
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for a list of texts."""
        return self.embeddings.embed_documents(texts)
    
    def embed_query(self, text: str) -> List[float]:
        """Generate embedding for a single query."""
        return self.embeddings.embed_query(text)
    
    def get_embeddings(self):
        """Return the underlying embeddings object for use with vector stores."""
        return self.embeddings

Step 3: Vector Store with ChromaDB

ChromaDB provides efficient vector storage and retrieval:

# vector_store.py
import os
from typing import List, Optional, Dict, Any
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from embeddings import EmbeddingManager

class VectorStore:
    """Manages vector storage and retrieval using ChromaDB."""
    
    def __init__(
        self,
        persist_directory: str = "./chroma_db",
        collection_name: str = "documents",
        embedding_provider: str = "huggingface"
    ):
        self.persist_directory = persist_directory
        self.collection_name = collection_name
        
        # Initialize embedding manager
        self.embedding_manager = EmbeddingManager(provider=embedding_provider)
        
        # Initialize or load existing vector store
        self.vector_store = Chroma(
            collection_name=self.collection_name,
            embedding_function=self.embedding_manager.get_embeddings(),
            persist_directory=self.persist_directory
        )
    
    def add_documents(self, documents: List[Document]) -> List[str]:
        """Add documents to the vector store."""
        ids = self.vector_store.add_documents(documents)
        return ids
    
    def similarity_search(
        self,
        query: str,
        k: int = 4,
        filter: Optional[Dict[str, Any]] = None
    ) -> List[Document]:
        """Search for similar documents."""
        return self.vector_store.similarity_search(
            query=query,
            k=k,
            filter=filter
        )
    
    def similarity_search_with_score(
        self,
        query: str,
        k: int = 4
    ) -> List[tuple]:
        """Search with relevance scores."""
        return self.vector_store.similarity_search_with_score(
            query=query,
            k=k
        )
    
    def max_marginal_relevance_search(
        self,
        query: str,
        k: int = 4,
        fetch_k: int = 20,
        lambda_mult: float = 0.5
    ) -> List[Document]:
        """MMR search for diverse results."""
        return self.vector_store.max_marginal_relevance_search(
            query=query,
            k=k,
            fetch_k=fetch_k,
            lambda_mult=lambda_mult
        )
    
    def delete_collection(self):
        """Delete the entire collection."""
        self.vector_store.delete_collection()
    
    def get_collection_stats(self) -> Dict[str, Any]:
        """Get statistics about the collection."""
        collection = self.vector_store._collection
        return {
            "name": self.collection_name,
            "count": collection.count(),
            "persist_directory": self.persist_directory
        }

Step 4: Question Answering Chain

Now let’s build the QA system that ties everything together:

# qa_chain.py
import os
from typing import List, Dict, Any, Optional
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import Document
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from vector_store import VectorStore

class DocumentQA:
    """Question answering system using RAG."""
    
    def __init__(
        self,
        vector_store: VectorStore,
        model_name: str = "gpt-3.5-turbo",
        temperature: float = 0.0,
        max_tokens: int = 1000
    ):
        self.vector_store = vector_store
        
        # Initialize LLM
        self.llm = ChatOpenAI(
            model_name=model_name,
            temperature=temperature,
            max_tokens=max_tokens,
            openai_api_key=os.getenv("OPENAI_API_KEY")
        )
        
        # Conversation memory
        self.memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True,
            output_key="answer"
        )
        
        # QA prompt template
        self.qa_prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful assistant that answers questions based on the provided context. 
            
Instructions:
- Answer the question using ONLY the information from the context provided
- If the answer is not in the context, say "I could not find this information in the documents"
- Be concise but comprehensive
- If relevant, mention which document the information comes from
- Do not make up information that is not in the context"""),
            ("human", """Context from documents:
{context}

Question: {question}

Please provide a helpful answer based on the context above.""")
        ])
    
    def format_context(self, documents: List[Document]) -> str:
        """Format retrieved documents into context string."""
        context_parts = []
        
        for i, doc in enumerate(documents, 1):
            source = doc.metadata.get("source", "Unknown")
            chunk_idx = doc.metadata.get("chunk_index", "N/A")
            context_parts.append(
                f"[Document {i} - {source} (chunk {chunk_idx})]:n{doc.page_content}"
            )
        
        return "nn---nn".join(context_parts)
    
    def answer_question(
        self,
        question: str,
        k: int = 4,
        use_mmr: bool = True
    ) -> Dict[str, Any]:
        """Answer a question using RAG."""
        
        # Retrieve relevant documents
        if use_mmr:
            documents = self.vector_store.max_marginal_relevance_search(
                query=question,
                k=k,
                fetch_k=k * 3
            )
        else:
            documents = self.vector_store.similarity_search(
                query=question,
                k=k
            )
        
        if not documents:
            return {
                "answer": "No relevant documents found to answer your question.",
                "sources": [],
                "context": ""
            }
        
        # Format context
        context = self.format_context(documents)
        
        # Generate answer
        messages = self.qa_prompt.format_messages(
            context=context,
            question=question
        )
        
        response = self.llm.invoke(messages)
        
        # Extract source information
        sources = list(set([
            doc.metadata.get("source", "Unknown") 
            for doc in documents
        ]))
        
        return {
            "answer": response.content,
            "sources": sources,
            "context": context,
            "num_chunks_retrieved": len(documents)
        }
    
    def get_conversational_chain(self):
        """Get a conversational chain for multi-turn conversations."""
        return ConversationalRetrievalChain.from_llm(
            llm=self.llm,
            retriever=self.vector_store.vector_store.as_retriever(
                search_type="mmr",
                search_kwargs={"k": 4, "fetch_k": 12}
            ),
            memory=self.memory,
            return_source_documents=True,
            verbose=False
        )
    
    def chat(self, question: str) -> Dict[str, Any]:
        """Handle conversational Q&A with memory."""
        chain = self.get_conversational_chain()
        result = chain({"question": question})
        
        return {
            "answer": result["answer"],
            "sources": [
                doc.metadata.get("source", "Unknown")
                for doc in result.get("source_documents", [])
            ]
        }
    
    def clear_memory(self):
        """Clear conversation history."""
        self.memory.clear()

Step 5: Flask Web Application

Let’s create a user-friendly web interface:

# app.py
import os
from flask import Flask, render_template, request, jsonify
from werkzeug.utils import secure_filename
from dotenv import load_dotenv

from document_processor import DocumentProcessor
from vector_store import VectorStore
from qa_chain import DocumentQA

# Load environment variables
load_dotenv()

app = Flask(__name__)
app.config["UPLOAD_FOLDER"] = "uploads"
app.config["MAX_CONTENT_LENGTH"] = 50 * 1024 * 1024  # 50MB max

# Ensure directories exist
os.makedirs(app.config["UPLOAD_FOLDER"], exist_ok=True)
os.makedirs("chroma_db", exist_ok=True)

# Initialize components
document_processor = DocumentProcessor(chunk_size=1000, chunk_overlap=200)
vector_store = VectorStore(
    persist_directory="./chroma_db",
    embedding_provider="huggingface"  # Free, no API key needed
)
qa_system = DocumentQA(vector_store=vector_store)

ALLOWED_EXTENSIONS = {"pdf", "docx", "doc", "txt"}

def allowed_file(filename):
    return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS

@app.route("/")
def index():
    stats = vector_store.get_collection_stats()
    return render_template("index.html", doc_count=stats["count"])

@app.route("/upload", methods=["POST"])
def upload_document():
    if "file" not in request.files:
        return jsonify({"error": "No file provided"}), 400
    
    file = request.files["file"]
    
    if file.filename == "":
        return jsonify({"error": "No file selected"}), 400
    
    if not allowed_file(file.filename):
        return jsonify({"error": "File type not supported"}), 400
    
    try:
        # Save file
        filename = secure_filename(file.filename)
        file_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
        file.save(file_path)
        
        # Process document
        chunks = document_processor.process_file(file_path)
        
        # Add to vector store
        vector_store.add_documents(chunks)
        
        return jsonify({
            "success": True,
            "message": f"Successfully processed {filename}",
            "chunks_created": len(chunks)
        })
    
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/ask", methods=["POST"])
def ask_question():
    data = request.get_json()
    question = data.get("question", "").strip()
    
    if not question:
        return jsonify({"error": "No question provided"}), 400
    
    try:
        result = qa_system.answer_question(question, k=4)
        return jsonify(result)
    
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    question = data.get("question", "").strip()
    
    if not question:
        return jsonify({"error": "No question provided"}), 400
    
    try:
        result = qa_system.chat(question)
        return jsonify(result)
    
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/clear_chat", methods=["POST"])
def clear_chat():
    qa_system.clear_memory()
    return jsonify({"success": True})

@app.route("/stats")
def get_stats():
    return jsonify(vector_store.get_collection_stats())

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Step 6: Web Interface

Create a clean, functional interface:

<!-- templates/index.html -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document Q&A System</title>
    <style>
        * { box-sizing: border-box; margin: 0; padding: 0; }
        body { 
            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
            background: #f5f5f5; 
            min-height: 100vh;
            padding: 20px;
        }
        .container { max-width: 900px; margin: 0 auto; }
        h1 { text-align: center; color: #333; margin-bottom: 30px; }
        
        .card {
            background: white;
            border-radius: 12px;
            padding: 25px;
            margin-bottom: 20px;
            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
        }
        
        .upload-zone {
            border: 2px dashed #ddd;
            border-radius: 8px;
            padding: 40px;
            text-align: center;
            cursor: pointer;
            transition: border-color 0.3s;
        }
        .upload-zone:hover { border-color: #007bff; }
        .upload-zone.dragover { border-color: #007bff; background: #f8f9ff; }
        
        #file-input { display: none; }
        
        .stats { 
            display: flex; 
            gap: 20px; 
            margin-bottom: 20px;
            flex-wrap: wrap;
        }
        .stat-item {
            background: #e3f2fd;
            padding: 15px 25px;
            border-radius: 8px;
            text-align: center;
        }
        .stat-value { font-size: 24px; font-weight: bold; color: #1976d2; }
        .stat-label { font-size: 12px; color: #666; }
        
        .chat-container {
            height: 400px;
            overflow-y: auto;
            border: 1px solid #eee;
            border-radius: 8px;
            padding: 15px;
            margin-bottom: 15px;
            background: #fafafa;
        }
        
        .message {
            margin-bottom: 15px;
            padding: 12px 16px;
            border-radius: 12px;
            max-width: 85%;
        }
        .user-message {
            background: #007bff;
            color: white;
            margin-left: auto;
        }
        .assistant-message {
            background: white;
            border: 1px solid #eee;
        }
        
        .sources {
            font-size: 12px;
            color: #666;
            margin-top: 8px;
            padding-top: 8px;
            border-top: 1px solid #eee;
        }
        
        .input-group {
            display: flex;
            gap: 10px;
        }
        
        input[type="text"] {
            flex: 1;
            padding: 12px 16px;
            border: 1px solid #ddd;
            border-radius: 8px;
            font-size: 16px;
        }
        
        button {
            padding: 12px 24px;
            background: #007bff;
            color: white;
            border: none;
            border-radius: 8px;
            cursor: pointer;
            font-size: 16px;
            transition: background 0.3s;
        }
        button:hover { background: #0056b3; }
        button:disabled { background: #ccc; cursor: not-allowed; }
        
        .loading { opacity: 0.6; pointer-events: none; }
    </style>
</head>
<body>
    <div class="container">
        <h1>📚 Document Q&A System</h1>
        
        <div class="card">
            <h3>Upload Documents</h3>
            <div class="upload-zone" id="upload-zone">
                <p>📁 Drag and drop files here or click to browse</p>
                <p style="font-size: 12px; color: #666; margin-top: 10px;">
                    Supported: PDF, DOCX, DOC, TXT (Max 50MB)
                </p>
                <input type="file" id="file-input" accept=".pdf,.docx,.doc,.txt">
            </div>
            <div id="upload-status" style="margin-top: 15px;"></div>
        </div>
        
        <div class="stats">
            <div class="stat-item">
                <div class="stat-value" id="doc-count">{{ doc_count }}</div>
                <div class="stat-label">Document Chunks</div>
            </div>
        </div>
        
        <div class="card">
            <h3>Ask Questions</h3>
            <div class="chat-container" id="chat-container">
                <div class="message assistant-message">
                    Hello! Upload some documents and ask me questions about them.
                </div>
            </div>
            <div class="input-group">
                <input type="text" id="question-input" placeholder="Ask a question about your documents...">
                <button id="ask-btn">Ask</button>
                <button id="clear-btn" style="background: #6c757d;">Clear</button>
            </div>
        </div>
    </div>
    
    <script>
        const uploadZone = document.getElementById("upload-zone");
        const fileInput = document.getElementById("file-input");
        const uploadStatus = document.getElementById("upload-status");
        const chatContainer = document.getElementById("chat-container");
        const questionInput = document.getElementById("question-input");
        const askBtn = document.getElementById("ask-btn");
        const clearBtn = document.getElementById("clear-btn");
        
        // Upload handling
        uploadZone.addEventListener("click", () => fileInput.click());
        uploadZone.addEventListener("dragover", (e) => {
            e.preventDefault();
            uploadZone.classList.add("dragover");
        });
        uploadZone.addEventListener("dragleave", () => {
            uploadZone.classList.remove("dragover");
        });
        uploadZone.addEventListener("drop", (e) => {
            e.preventDefault();
            uploadZone.classList.remove("dragover");
            handleFile(e.dataTransfer.files[0]);
        });
        fileInput.addEventListener("change", () => handleFile(fileInput.files[0]));
        
        async function handleFile(file) {
            if (!file) return;
            
            const formData = new FormData();
            formData.append("file", file);
            
            uploadStatus.innerHTML = "⏳ Processing " + file.name + "...";
            
            try {
                const response = await fetch("/upload", {
                    method: "POST",
                    body: formData
                });
                const data = await response.json();
                
                if (data.success) {
                    uploadStatus.innerHTML = "✅ " + data.message + " (" + data.chunks_created + " chunks)";
                    updateStats();
                } else {
                    uploadStatus.innerHTML = "❌ Error: " + data.error;
                }
            } catch (error) {
                uploadStatus.innerHTML = "❌ Upload failed: " + error.message;
            }
        }
        
        async function updateStats() {
            const response = await fetch("/stats");
            const data = await response.json();
            document.getElementById("doc-count").textContent = data.count;
        }
        
        // Chat handling
        askBtn.addEventListener("click", askQuestion);
        questionInput.addEventListener("keypress", (e) => {
            if (e.key === "Enter") askQuestion();
        });
        clearBtn.addEventListener("click", clearChat);
        
        async function askQuestion() {
            const question = questionInput.value.trim();
            if (!question) return;
            
            addMessage(question, "user");
            questionInput.value = "";
            askBtn.disabled = true;
            
            try {
                const response = await fetch("/chat", {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body: JSON.stringify({ question })
                });
                const data = await response.json();
                
                if (data.error) {
                    addMessage("Error: " + data.error, "assistant");
                } else {
                    addMessage(data.answer, "assistant", data.sources);
                }
            } catch (error) {
                addMessage("Error: " + error.message, "assistant");
            }
            
            askBtn.disabled = false;
        }
        
        function addMessage(text, type, sources = []) {
            const div = document.createElement("div");
            div.className = "message " + type + "-message";
            div.textContent = text;
            
            if (sources && sources.length > 0) {
                const sourcesDiv = document.createElement("div");
                sourcesDiv.className = "sources";
                sourcesDiv.textContent = "📄 Sources: " + sources.join(", ");
                div.appendChild(sourcesDiv);
            }
            
            chatContainer.appendChild(div);
            chatContainer.scrollTop = chatContainer.scrollHeight;
        }
        
        async function clearChat() {
            await fetch("/clear_chat", { method: "POST" });
            chatContainer.innerHTML = "<div class="message assistant-message">Chat cleared. Ask me anything about your documents!</div>";
        }
    </script>
</body>
</html>

Advanced Features

Hybrid Search

Combine keyword and semantic search for better results:

# Add to vector_store.py
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

def get_hybrid_retriever(self, documents: List[Document], weights: List[float] = [0.5, 0.5]):
    """Create a hybrid retriever combining BM25 and semantic search."""
    
    # BM25 for keyword matching
    bm25_retriever = BM25Retriever.from_documents(documents)
    bm25_retriever.k = 4
    
    # Semantic retriever
    semantic_retriever = self.vector_store.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 4}
    )
    
    # Combine both
    ensemble_retriever = EnsembleRetriever(
        retrievers=[bm25_retriever, semantic_retriever],
        weights=weights
    )
    
    return ensemble_retriever

Query Expansion

Generate multiple query variations for better retrieval:

# Add to qa_chain.py
def expand_query(self, question: str) -> List[str]:
    """Generate multiple query variations."""
    
    expansion_prompt = """Generate 3 alternative phrasings of this question that might help find relevant information.
    
Original question: {question}

Return only the 3 alternative questions, one per line."""

    response = self.llm.invoke(expansion_prompt.format(question=question))
    
    queries = [question]  # Include original
    queries.extend(response.content.strip().split("n"))
    
    return queries[:4]  # Limit to 4 total queries

Deployment Considerations

Production Setup

# config.py
import os
from dataclasses import dataclass

@dataclass
class Config:
    # API Keys
    OPENAI_API_KEY: str = os.getenv("OPENAI_API_KEY", "")
    
    # Vector Store
    CHROMA_PERSIST_DIR: str = os.getenv("CHROMA_PERSIST_DIR", "./chroma_db")
    COLLECTION_NAME: str = "production_docs"
    
    # Document Processing
    CHUNK_SIZE: int = 1000
    CHUNK_OVERLAP: int = 200
    
    # LLM Settings
    MODEL_NAME: str = "gpt-3.5-turbo"
    TEMPERATURE: float = 0.0
    MAX_TOKENS: int = 1000
    
    # Search Settings
    RETRIEVAL_K: int = 4
    USE_MMR: bool = True

Testing Your System

# test_qa.py
from document_processor import DocumentProcessor
from vector_store import VectorStore
from qa_chain import DocumentQA

# Initialize
processor = DocumentProcessor()
store = VectorStore()
qa = DocumentQA(vector_store=store)

# Add test document
test_content = """
Artificial Intelligence (AI) is transforming industries worldwide.
Machine Learning enables computers to learn from data.
Deep Learning uses neural networks with multiple layers.
Natural Language Processing helps computers understand human language.
"""

# Process and add
with open("test_doc.txt", "w") as f:
    f.write(test_content)

chunks = processor.process_file("test_doc.txt")
store.add_documents(chunks)

# Test questions
questions = [
    "What is AI?",
    "How does Machine Learning work?",
    "What is Deep Learning?"
]

for q in questions:
    result = qa.answer_question(q)
    print(f"Q: {q}")
    print(f"A: {result['answer']}n")

Conclusion

You’ve built a complete RAG-based Document Q&A system that can ingest documents, store them efficiently, and answer questions accurately. This system demonstrates production-ready patterns including proper chunking strategies, hybrid search capabilities, conversational memory, and a clean web interface.

Key takeaways:

RAG grounds LLM responses in your actual documents
Proper chunking is crucial for retrieval quality
MMR search provides diverse, relevant results
Local embeddings eliminate API costs for small projects
Conversation memory enables multi-turn interactions

This foundation can be extended with features like document comparison, multi-language support, citation highlighting, and integration with enterprise document management systems.