Skip to main content

Template Info

Category Ai
Difficulty
Intermediate
Setup Time 20 minutes
License MIT
Last Updated 2024-01-15
Download Raw Files View on GitHub Live Demo

Key Features

  • Document ingestion and processing pipeline
  • Vector database integration with ChromaDB
  • FastAPI REST API for RAG queries
  • LangChain integration with multiple LLM providers
  • Pydantic models for type-safe data handling
  • Comprehensive error handling and logging
  • Docker containerization
  • Async processing for better performance

Use Cases

  • Document Q&A systems
  • Knowledge base chatbots
  • Research assistance tools
  • Customer support automation
  • Content discovery platforms
Templates Ai LangChain RAG Application Template

LangChain RAG Application Template

Featured Free License

A complete Retrieval-Augmented Generation (RAG) application template using LangChain, FastAPI, and vector databases

Setup Time
20 minutes
Difficulty
Intermediate
Components
5 included

Overview

This LangChain RAG (Retrieval-Augmented Generation) template provides a complete foundation for building document-based question-answering systems. It combines the power of vector databases for semantic search with large language models for natural language generation.

The template includes document processing, vector storage, retrieval mechanisms, and a FastAPI interface for easy integration into web applications.

Features

Document Processing Pipeline

  • Multi-format Support: PDF, Word, text files, and web scraping
  • Intelligent Chunking: Smart text splitting with overlap for context preservation
  • Metadata Extraction: Automatic extraction of document metadata
  • Batch Processing: Efficient processing of large document collections

Vector Database Integration

  • ChromaDB Integration: Local vector database for development
  • Scalable Storage: Support for production vector databases (Pinecone, Weaviate)
  • Semantic Search: Advanced similarity search with filtering
  • Embedding Management: Flexible embedding model configuration

LangChain RAG Pipeline

  • Retrieval Chain: Optimized document retrieval with relevance scoring
  • Generation Chain: Context-aware response generation
  • Memory Integration: Conversation history and context management
  • Multi-step Reasoning: Support for complex queries requiring multiple retrievals

FastAPI REST API

  • Type-safe Endpoints: Pydantic models for request/response validation
  • Async Processing: Non-blocking operations for better performance
  • Error Handling: Comprehensive error handling and logging
  • Interactive Documentation: Automatic OpenAPI documentation

Quick Start

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key (or alternative LLM provider)
  • Git
  • Docker (optional)

Installation

  1. Clone the template

    1
    2
    
    git clone https://github.com/pragmatic-ai-stack/langchain-rag-template.git
    cd langchain-rag-template
    
  2. Create virtual environment

    1
    2
    
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies

    1
    
    pip install -r requirements.txt
    
  4. Set up environment variables

    1
    2
    
    cp .env.example .env
    # Edit .env with your configuration
    
  5. Initialize the vector database

    1
    
    python scripts/setup_database.py
    
  6. Ingest sample documents

    1
    
    python scripts/ingest_documents.py --path data/sample_docs/
    
  7. Run the application

    1
    
    uvicorn app.main:app --reload
    

Expected Output

1
2
3
4
5
6
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [28720]
INFO:     Started server process [28722]
INFO:     Waiting for application startup.
INFO:     Vector database initialized with 150 documents
INFO:     Application startup complete.

Visit http://localhost:8000/docs to see the interactive API documentation.

Project Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
langchain-rag-template/
├── app/
│   ├── __init__.py
│   ├── main.py                 # FastAPI application
│   ├── config.py              # Configuration settings
│   ├── api/
│   │   ├── __init__.py
│   │   ├── routes.py          # API endpoints
│   │   └── dependencies.py    # FastAPI dependencies
│   ├── models/
│   │   ├── __init__.py
│   │   ├── requests.py        # Request models
│   │   └── responses.py       # Response models
│   ├── rag/
│   │   ├── __init__.py
│   │   ├── document_processor.py  # Document processing
│   │   ├── vector_store.py        # Vector database operations
│   │   ├── retriever.py           # Document retrieval
│   │   ├── generator.py           # Response generation
│   │   └── chains.py              # LangChain chains
│   └── utils/
│       ├── __init__.py
│       ├── logging.py         # Logging configuration
│       └── exceptions.py      # Custom exceptions
├── data/
│   ├── sample_docs/           # Sample documents
│   └── processed/             # Processed document cache
├── tests/
│   ├── __init__.py
│   ├── test_api.py           # API tests
│   ├── test_rag.py           # RAG functionality tests
│   └── conftest.py           # Test configuration
├── scripts/
│   ├── setup_database.py     # Database initialization
│   ├── ingest_documents.py   # Document ingestion
│   └── benchmark.py          # Performance benchmarking
├── docker-compose.yml        # Docker services
├── Dockerfile               # Docker image
├── requirements.txt         # Python dependencies
├── .env.example            # Environment template
└── README.md               # Documentation

Configuration

The template uses environment variables for configuration. Copy .env.example to .env and customize:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# LLM Configuration
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=gpt-3.5-turbo
LLM_TEMPERATURE=0.7

# Embedding Configuration
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_PROVIDER=openai

# Vector Database
VECTOR_DB_TYPE=chromadb
VECTOR_DB_PATH=./data/vector_db
COLLECTION_NAME=documents

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=1

# Document Processing
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
MAX_DOCUMENTS=10000

# Retrieval Configuration
RETRIEVAL_K=5
SIMILARITY_THRESHOLD=0.7

API Endpoints

Document Management

  • POST /documents/upload - Upload and process documents
  • GET /documents/ - List processed documents
  • DELETE /documents/{doc_id} - Remove document from index

RAG Operations

  • POST /query - Ask questions about your documents
  • POST /chat - Conversational interface with memory
  • GET /similar - Find similar documents

System Operations

  • GET /health - Application health check
  • GET /stats - System statistics and metrics

Core Components

Document Processor

1
2
3
4
5
6
7
8
9
from app.rag.document_processor import DocumentProcessor

processor = DocumentProcessor()

# Process a single document
result = await processor.process_document("path/to/document.pdf")

# Batch process multiple documents
results = await processor.process_directory("path/to/documents/")

Vector Store Operations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from app.rag.vector_store import VectorStore

vector_store = VectorStore()

# Add documents to the vector store
await vector_store.add_documents(documents)

# Search for similar documents
results = await vector_store.similarity_search(
    query="What is machine learning?",
    k=5,
    filter={"category": "AI"}
)

RAG Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from app.rag.chains import RAGChain

rag_chain = RAGChain()

# Ask a question
response = await rag_chain.query(
    question="How does neural network training work?",
    chat_history=[]
)

print(response.answer)
print(response.source_documents)

Customization Guide

Adding New Document Types

  1. Create a document loader in app/rag/loaders/
  2. Register the loader in document_processor.py
  3. Update the API to accept the new file type
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# app/rag/loaders/custom_loader.py
from langchain.document_loaders.base import BaseLoader

class CustomDocumentLoader(BaseLoader):
    def __init__(self, file_path: str):
        self.file_path = file_path
    
    def load(self):
        # Your custom loading logic
        pass

Switching Vector Databases

  1. Update configuration in .env
  2. Implement vector store adapter in app/rag/vector_stores/
  3. Update dependency injection in app/api/dependencies.py

Custom LLM Integration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# app/rag/llms/custom_llm.py
from langchain.llms.base import LLM

class CustomLLM(LLM):
    def _call(self, prompt: str, stop=None):
        # Your custom LLM integration
        pass
    
    @property
    def _llm_type(self):
        return "custom"

Testing

Run the test suite:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Run all tests
pytest

# Run with coverage
pytest --cov=app

# Run specific test file
pytest tests/test_rag.py

# Run integration tests
pytest tests/integration/

Performance Optimization

Caching Strategies

1
2
3
4
5
6
7
# Enable response caching
ENABLE_RESPONSE_CACHE=true
CACHE_TTL=3600

# Enable embedding caching
ENABLE_EMBEDDING_CACHE=true
EMBEDDING_CACHE_SIZE=10000

Batch Processing

1
2
3
# Process documents in batches
BATCH_SIZE=50
CONCURRENT_WORKERS=4

Vector Database Optimization

1
2
3
# Optimize vector search
VECTOR_INDEX_TYPE=hnsw
VECTOR_SEARCH_PARAMS={"ef": 128, "M": 16}

Deployment

Docker Deployment

  1. Build the image

    1
    
    docker build -t langchain-rag-app .
    
  2. Run with docker-compose

    1
    
    docker-compose up -d
    

Production Considerations

  • Load Balancing: Use multiple API instances behind a load balancer
  • Database Scaling: Consider managed vector database services
  • Monitoring: Implement comprehensive logging and metrics
  • Security: Add authentication and rate limiting
  • Caching: Implement Redis for response caching

Monitoring and Observability

Metrics Collection

1
2
3
4
5
6
# Built-in metrics
- Query response time
- Document processing time
- Vector search performance
- LLM token usage
- Error rates

Logging

1
2
3
4
5
6
7
8
# Structured logging with context
import structlog

logger = structlog.get_logger()
logger.info("Query processed", 
           query=query, 
           response_time=response_time,
           documents_retrieved=len(docs))

Support

License

This template is released under the MIT License. See LICENSE for details.

Project Structure

app/:Main application directory
app/api/:API route definitions
app/main.py:FastAPI application entry point
app/models/:Pydantic data models
app/rag/:RAG-specific modules
app/rag/document_processor.py:Document ingestion and processing
app/rag/generator.py:Response generation with LLM
app/rag/retriever.py:Document retrieval logic
app/rag/vector_store.py:Vector database operations
data/:Sample documents and datasets
docker-compose.yml:Docker services configuration
requirements.txt:Python dependencies
tests/:Test suite

Customization Options

Vector Database

Choose between ChromaDB, Pinecone, or Weaviate

LLM Provider

Configure OpenAI, Anthropic, or local models

Document Types

Support for PDF, Word, text, and web scraping

Embedding Model

Choose embedding model (OpenAI, Sentence Transformers, etc.)

Authentication

Add API key authentication and user management

Deployment Options

Docker

Deploy using Docker containers

  1. Build Docker image
  2. Configure environment variables
  3. Run with docker-compose
  4. Set up reverse proxy

AWS

Deploy to AWS with ECS and RDS

  1. Create ECS cluster
  2. Set up RDS for metadata
  3. Configure S3 for document storage
  4. Deploy with CloudFormation

Google Cloud

Deploy to Google Cloud Run

  1. Build container image
  2. Push to Container Registry
  3. Deploy to Cloud Run
  4. Configure Cloud Storage