Build a Production-Ready AI Agent With FastAPI + RAG (2025 Guide)

If you’ve been following the recent explosion of AI agents, you’ve probably noticed something: most tutorials feel like incomplete demos. They show a quick chain, a basic prompt, maybe a vector search… and then leave you alone when things actually get complicated.

This guide is different. We’re building something you can trust — an AI agent powered by LangGraph, supported by RAG for grounded answers, and delivered through a clean, secure FastAPI backend. Everything is built with real-world usage in mind: structure, safety, logging, tests, Docker, deployment… the works.

By the time you finish this tutorial, you’ll have a working agent you could ship to your team or integrate into your product. And more importantly, you’ll understand why each piece is built the way it is, so you can extend it confidently later.

Let’s dive in and build something that actually feels production-ready.

What You’re Building (and Why LangGraph Makes It Better)

You’re creating a structured, controllable AI agent that understands your documents and responds intelligently. Unlike the “just call an LLM and hope for the best” approach, this agent:

Takes user questions naturally
Retrieves relevant chunks from your files
Walks through a LangGraph workflow with defined steps
Fallbacks gracefully when things go wrong
Returns grounded, citation-backed answers
Exposes everything through a FastAPI endpoint other apps can call

Think of LangGraph as the difference between:

A single prompt hoping to handle all logic
A proper state machine where each step is explicit

That second one is how you avoid hallucinations, gain control, and build workflows that scale.

Architecture Overview (simple but powerful)

Client → FastAPI → LangGraph Agent → Tools → Vector Database → Response

FastAPI handles the API cleanly, LangGraph runs the workflow, and the vector database stores your document embeddings. Each layer has a clear purpose, which is key when you want reliability and predictable behavior.

Before You Start (What You Need)

Python 3.11+
Basic terminal or command line experience
Docker (optional but strongly recommended)
Your model/embedding provider key
A few PDF or markdown files you want your agent to learn from

1. Project Setup: Structure, Environment & Dependencies

A lot of AI devs underestimate how much project structure matters. A messy folder leads to debugging nightmares later. So let's set up a clean, scalable layout:

ai-agent/
├── app/
│   ├── api.py
│   ├── agent.py
│   ├── rag.py
│   ├── config.py
│   └── tests/
│       └── test_ask.py
├── data/
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install fastapi uvicorn langgraph langchain python-dotenv weaviate-client pydantic numpy httpx

Set up your environment variables:

MODEL_API_KEY=your_key_here
VECTOR_DB_URL=your_db_url

Important: Never hardcode API keys inside Python files. That’s how production gets messy and security gets expensive.

2. Build the Agent with LangGraph

This is the fun part — turning simple functions into an intelligent workflow. LangGraph lets you break your agent into small pieces (nodes), connect them with logic (edges), and add rules that prevent chaos.

Step 1: Define Your Tools

Your agent will use three basic tools:

retrieve() — fetch relevant chunks
web_search() — optional fallback
write_answer() — the final synthesis

# app/agent.py

async def retrieve(query):
    # Logic to hit vector DB
    return chunks

async def web_search(query):
    # Optional fallback tool
    return results

async def write_answer(context, question):
    # Final LLM call to synthesize answer
    return formatted_answer

Step 2: Build the Graph Flow

Think of the graph as a workflow with guardrails. A typical flow looks like:

Node 1: Clean or rephrase the user’s question
Node 2: Retrieve documents
Node 3: Grade those documents for relevance
Node 4: Generate the final answer

This sounds simple, but this structure is what separates a stable agent from an unpredictable prompt.

Step 3: Add Safety Nets

Production agents need protection. Add:

Loop limits → prevent infinite tool calls
Timeouts → automatically cancel long-running operations
Explicit “I don't know” answers → avoid hallucinations

3. Add RAG: Ingestion, Chunking & Retrieval

RAG is how we keep the agent honest. Instead of letting the model “guess” an answer, we ground it in real documents.

Step 1: Load Files

# rag.py

def load_docs(directory):
    # Use LangChain loaders for PDF, MD, TXT
    return docs

Step 2: Chunk Smartly

The two most common RAG mistakes are:

Chunks that are too big → models miss details
Chunks that are too small → context becomes fragmented

The sweet spot:

Size: 500–800 tokens
Overlap: 10–15%

Step 3: Embed & Store Chunks

Push your embeddings to your vector DB:

client = weaviate.connect(VECTOR_DB_URL)
client.add_embedding(chunk_text, metadata)

Store metadata like filenames, page numbers, or headings — it helps with transparency and citations.

4. Expose Everything Through FastAPI

This is where your agent becomes usable. FastAPI gives you typed, clean, auto-documented endpoints.

Create the `/ask` Endpoint

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class QuestionInput(BaseModel):
    question: str

@app.post("/ask")
async def ask(payload: QuestionInput):
    # Run the graph
    answer = await agent.run(payload.question)
    return {"answer": answer.text, "sources": answer.sources}

Add CORS Support

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],   
    allow_methods=["*"],
    allow_headers=["*"],
)

Run the Server

uvicorn app.api:app --host 0.0.0.0 --port 8000

Interactive docs at: http://localhost:8000/docs

5. Evaluate & Observe Your Agent

A big difference between quick demos and production-grade agents is simple: testing and observability.

Basic Tests

# tests/test_ask.py

def test_basic_answer():
    res = client.post("/ask", json={"question": "What is in the docs?"})
    assert res.status_code == 200
    assert res.json()["answer"] != ""

What to Log

How many nodes executed
Which tools were called
Total response time
Did fallback logic activate?

These logs help you catch issues early and spot slowdowns before users complain.

6. Deploy with Docker (The Easy, Reliable Way)

Containerizing your agent means you get the same behavior locally, in staging, and in production. No dependency hell. No "it works on my machine".

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]

Build & Run

docker build -t ai-agent .
docker run -p 8000:8000 --env-file .env ai-agent

Common Pitfalls & Real Solutions

Retrieval returns nothing: Increase top-k or improve embeddings.

Agent hallucinating: Use strict instructions + "I don't know" logic.

Responses slow: Add async operations and limit graph hops.

CORS issues: Make sure frontend origin matches exactly — even missing slashes matter.

Frequently Asked Questions

Q1: Is LangGraph replacing LangChain?
No. LangGraph focuses on agent flow control; LangChain handles utilities. They complement each other.

Q2: Do I need a paid vector DB?
No. Start with Chroma or Weaviate’s free tier.

Q3: Can I add web search?
Yes. Tools like Tavily/Serper can act as fallback nodes.

🚀 Turbocharge Your Workflow

Try our free AI-powered tools to automate your daily tasks.

Instagram Captions SEO Keywords

What You’re Building (and Why LangGraph Makes It Better)

You’re creating a structured, controllable AI agent that understands your documents and responds intelligently. Unlike the “just call an LLM and hope for the best” approach, this agent:

Takes user questions naturally
Retrieves relevant chunks from your files
Walks through a LangGraph workflow with defined steps
Fallbacks gracefully when things go wrong
Returns grounded, citation-backed answers
Exposes everything through a FastAPI endpoint other apps can call

Think of LangGraph as the difference between:

A single prompt hoping to handle all logic
A proper state machine where each step is explicit

That second one is how you avoid hallucinations, gain control, and build workflows that scale.

Architecture Overview (simple but powerful)

Client → FastAPI → LangGraph Agent → Tools → Vector Database → Response

Before You Start (What You Need)

Python 3.11+
Basic terminal or command line experience
Docker (optional but strongly recommended)
Your model/embedding provider key
A few PDF or markdown files you want your agent to learn from

1. Project Setup: Structure, Environment & Dependencies

A lot of AI devs underestimate how much project structure matters. A messy folder leads to debugging nightmares later. So let's set up a clean, scalable layout:

ai-agent/
├── app/
│   ├── api.py
│   ├── agent.py
│   ├── rag.py
│   ├── config.py
│   └── tests/
│       └── test_ask.py
├── data/
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install fastapi uvicorn langgraph langchain python-dotenv weaviate-client pydantic numpy httpx

Set up your environment variables:

MODEL_API_KEY=your_key_here
VECTOR_DB_URL=your_db_url

Important: Never hardcode API keys inside Python files. That’s how production gets messy and security gets expensive.

2. Build the Agent with LangGraph

Step 1: Define Your Tools

Your agent will use three basic tools:

retrieve() — fetch relevant chunks
web_search() — optional fallback
write_answer() — the final synthesis

# app/agent.py

async def retrieve(query):
    # Logic to hit vector DB
    return chunks

async def web_search(query):
    # Optional fallback tool
    return results

async def write_answer(context, question):
    # Final LLM call to synthesize answer
    return formatted_answer

Step 2: Build the Graph Flow

Think of the graph as a workflow with guardrails. A typical flow looks like:

Node 1: Clean or rephrase the user’s question
Node 2: Retrieve documents
Node 3: Grade those documents for relevance
Node 4: Generate the final answer

This sounds simple, but this structure is what separates a stable agent from an unpredictable prompt.

Step 3: Add Safety Nets

Production agents need protection. Add:

Loop limits → prevent infinite tool calls
Timeouts → automatically cancel long-running operations
Explicit “I don't know” answers → avoid hallucinations

3. Add RAG: Ingestion, Chunking & Retrieval

RAG is how we keep the agent honest. Instead of letting the model “guess” an answer, we ground it in real documents.

Step 1: Load Files

# rag.py

def load_docs(directory):
    # Use LangChain loaders for PDF, MD, TXT
    return docs

Step 2: Chunk Smartly

The two most common RAG mistakes are:

Chunks that are too big → models miss details
Chunks that are too small → context becomes fragmented

The sweet spot:

Size: 500–800 tokens
Overlap: 10–15%

Step 3: Embed & Store Chunks

Push your embeddings to your vector DB:

client = weaviate.connect(VECTOR_DB_URL)
client.add_embedding(chunk_text, metadata)

Store metadata like filenames, page numbers, or headings — it helps with transparency and citations.

4. Expose Everything Through FastAPI

This is where your agent becomes usable. FastAPI gives you typed, clean, auto-documented endpoints.

Create the `/ask` Endpoint

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class QuestionInput(BaseModel):
    question: str

@app.post("/ask")
async def ask(payload: QuestionInput):
    # Run the graph
    answer = await agent.run(payload.question)
    return {"answer": answer.text, "sources": answer.sources}

Add CORS Support

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],   
    allow_methods=["*"],
    allow_headers=["*"],
)

Run the Server

uvicorn app.api:app --host 0.0.0.0 --port 8000

Interactive docs at: http://localhost:8000/docs

5. Evaluate & Observe Your Agent

A big difference between quick demos and production-grade agents is simple: testing and observability.

Basic Tests

# tests/test_ask.py

def test_basic_answer():
    res = client.post("/ask", json={"question": "What is in the docs?"})
    assert res.status_code == 200
    assert res.json()["answer"] != ""

What to Log

How many nodes executed
Which tools were called
Total response time
Did fallback logic activate?

These logs help you catch issues early and spot slowdowns before users complain.

6. Deploy with Docker (The Easy, Reliable Way)

Containerizing your agent means you get the same behavior locally, in staging, and in production. No dependency hell. No "it works on my machine".

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]

Build & Run

docker build -t ai-agent .
docker run -p 8000:8000 --env-file .env ai-agent

Common Pitfalls & Real Solutions

Retrieval returns nothing: Increase top-k or improve embeddings.

Agent hallucinating: Use strict instructions + "I don't know" logic.

Responses slow: Add async operations and limit graph hops.

CORS issues: Make sure frontend origin matches exactly — even missing slashes matter.

Frequently Asked Questions

Q1: Is LangGraph replacing LangChain?
No. LangGraph focuses on agent flow control; LangChain handles utilities. They complement each other.

Q2: Do I need a paid vector DB?
No. Start with Chroma or Weaviate’s free tier.

Q3: Can I add web search?
Yes. Tools like Tavily/Serper can act as fallback nodes.

🚀 Turbocharge Your Workflow

Try our free AI-powered tools to automate your daily tasks.

Instagram Captions SEO Keywords

What You’re Building (and Why LangGraph Makes It Better)

Architecture Overview (simple but powerful)

Before You Start (What You Need)

1. Project Setup: Structure, Environment & Dependencies

2. Build the Agent with LangGraph

Step 1: Define Your Tools

Step 2: Build the Graph Flow

Step 3: Add Safety Nets

3. Add RAG: Ingestion, Chunking & Retrieval

Step 1: Load Files

Step 2: Chunk Smartly

Step 3: Embed & Store Chunks

4. Expose Everything Through FastAPI

Create the /ask Endpoint

Add CORS Support

Run the Server

5. Evaluate & Observe Your Agent

Basic Tests

What to Log

6. Deploy with Docker (The Easy, Reliable Way)

Dockerfile

Build & Run

Common Pitfalls & Real Solutions

Frequently Asked Questions

🚀 Turbocharge Your Workflow

Read Next

LangGraph vs. The Rest: The Definitive Guide to the Best Agent Framework of 2025

What You’re Building (and Why LangGraph Makes It Better)

Architecture Overview (simple but powerful)

Before You Start (What You Need)

1. Project Setup: Structure, Environment & Dependencies

2. Build the Agent with LangGraph

Step 1: Define Your Tools

Step 2: Build the Graph Flow

Step 3: Add Safety Nets

3. Add RAG: Ingestion, Chunking & Retrieval

Step 1: Load Files

Step 2: Chunk Smartly

Step 3: Embed & Store Chunks

4. Expose Everything Through FastAPI

Create the /ask Endpoint

Add CORS Support

Run the Server

5. Evaluate & Observe Your Agent

Basic Tests

What to Log

6. Deploy with Docker (The Easy, Reliable Way)

Dockerfile

Build & Run

Common Pitfalls & Real Solutions

Frequently Asked Questions

🚀 Turbocharge Your Workflow

Read Next

LangGraph vs. The Rest: The Definitive Guide to the Best Agent Framework of 2025

Create the `/ask` Endpoint

Create the `/ask` Endpoint