LangGraph Tutorial (2025): Build a Production-Ready AI Agent with FastAPI + RAG
If you’ve been following the recent explosion of AI agents, you’ve probably noticed something: most tutorials feel like incomplete demos. They show a quick chain, a basic prompt, maybe a vector search… and then leave you alone when things actually get complicated.
This guide is different. We’re building something you can trust — an AI agent powered by LangGraph, supported by RAG for grounded answers, and delivered through a clean, secure FastAPI backend. Everything is built with real-world usage in mind: structure, safety, logging, tests, Docker, deployment… the works.
By the time you finish this tutorial, you’ll have a working agent you could ship to your team or integrate into your product. And more importantly, you’ll understand why each piece is built the way it is, so you can extend it confidently later.
Let’s dive in and build something that actually feels production-ready.
What You’re Building (and Why LangGraph Makes It Better)
You’re creating a structured, controllable AI agent that understands your documents and responds intelligently. Unlike the “just call an LLM and hope for the best” approach, this agent:
- Takes user questions naturally
- Retrieves relevant chunks from your files
- Walks through a LangGraph workflow with defined steps
- Fallbacks gracefully when things go wrong
- Returns grounded, citation-backed answers
- Exposes everything through a FastAPI endpoint other apps can call
Think of LangGraph as the difference between:
- A single prompt hoping to handle all logic
- A proper state machine where each step is explicit
That second one is how you avoid hallucinations, gain control, and build workflows that scale.
Architecture Overview (simple but powerful)
Client → FastAPI → LangGraph Agent → Tools → Vector Database → Response
FastAPI handles the API cleanly, LangGraph runs the workflow, and the vector database stores your document embeddings. Each layer has a clear purpose, which is key when you want reliability and predictable behavior.
Before You Start (What You Need)
- Python 3.11+
- Basic terminal or command line experience
- Docker (optional but strongly recommended)
- Your model/embedding provider key
- A few PDF or markdown files you want your agent to learn from
1. Project Setup: Structure, Environment & Dependencies
A lot of AI devs underestimate how much project structure matters. A messy folder leads to debugging nightmares later. So let's set up a clean, scalable layout:
ai-agent/
├── app/
│ ├── api.py
│ ├── agent.py
│ ├── rag.py
│ ├── config.py
│ └── tests/
│ └── test_ask.py
├── data/
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md
Create a virtual environment:
python3 -m venv venv
source venv/bin/activate
Install the required packages:
pip install fastapi uvicorn langgraph langchain python-dotenv weaviate-client pydantic numpy httpx
Set up your environment variables:
MODEL_API_KEY=your_key_here
VECTOR_DB_URL=your_db_url
Important: Never hardcode API keys inside Python files. That’s how production gets messy and security gets expensive.
2. Build the Agent with LangGraph
This is the fun part — turning simple functions into an intelligent workflow. LangGraph lets you break your agent into small pieces (nodes), connect them with logic (edges), and add rules that prevent chaos.
Step 1: Define Your Tools
Your agent will use three basic tools:
- retrieve() — fetch relevant chunks
- web_search() — optional fallback
- write_answer() — the final synthesis
# app/agent.py
async def retrieve(query):
# Logic to hit vector DB
return chunks
async def web_search(query):
# Optional fallback tool
return results
async def write_answer(context, question):
# Final LLM call to synthesize answer
return formatted_answer
Step 2: Build the Graph Flow
Think of the graph as a workflow with guardrails. A typical flow looks like:
- Node 1: Clean or rephrase the user’s question
- Node 2: Retrieve documents
- Node 3: Grade those documents for relevance
- Node 4: Generate the final answer
This sounds simple, but this structure is what separates a stable agent from an unpredictable prompt.
Step 3: Add Safety Nets
Production agents need protection. Add:
- Loop limits → prevent infinite tool calls
- Timeouts → automatically cancel long-running operations
- Explicit “I don't know” answers → avoid hallucinations
3. Add RAG: Ingestion, Chunking & Retrieval
RAG is how we keep the agent honest. Instead of letting the model “guess” an answer, we ground it in real documents.
Step 1: Load Files
# rag.py
def load_docs(directory):
# Use LangChain loaders for PDF, MD, TXT
return docs
Step 2: Chunk Smartly
The two most common RAG mistakes are:
- Chunks that are too big → models miss details
- Chunks that are too small → context becomes fragmented
The sweet spot:
- Size: 500–800 tokens
- Overlap: 10–15%
Step 3: Embed & Store Chunks
Push your embeddings to your vector DB:
client = weaviate.connect(VECTOR_DB_URL)
client.add_embedding(chunk_text, metadata)
Store metadata like filenames, page numbers, or headings — it helps with transparency and citations.
4. Expose Everything Through FastAPI
This is where your agent becomes usable. FastAPI gives you typed, clean, auto-documented endpoints.
Create the /ask Endpoint
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class QuestionInput(BaseModel):
question: str
@app.post("/ask")
async def ask(payload: QuestionInput):
# Run the graph
answer = await agent.run(payload.question)
return {"answer": answer.text, "sources": answer.sources}
Add CORS Support
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
Run the Server
uvicorn app.api:app --host 0.0.0.0 --port 8000
Interactive docs at: http://localhost:8000/docs
5. Evaluate & Observe Your Agent
A big difference between quick demos and production-grade agents is simple: testing and observability.
Basic Tests
# tests/test_ask.py
def test_basic_answer():
res = client.post("/ask", json={"question": "What is in the docs?"})
assert res.status_code == 200
assert res.json()["answer"] != ""
What to Log
- How many nodes executed
- Which tools were called
- Total response time
- Did fallback logic activate?
These logs help you catch issues early and spot slowdowns before users complain.
6. Deploy with Docker (The Easy, Reliable Way)
Containerizing your agent means you get the same behavior locally, in staging, and in production. No dependency hell. No "it works on my machine".
Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]
Build & Run
docker build -t ai-agent .
docker run -p 8000:8000 --env-file .env ai-agent
Common Pitfalls & Real Solutions
Retrieval returns nothing: Increase top-k or improve embeddings.
Agent hallucinating: Use strict instructions + "I don't know" logic.
Responses slow: Add async operations and limit graph hops.
CORS issues: Make sure frontend origin matches exactly — even missing slashes matter.
Frequently Asked Questions
Q1: Is LangGraph replacing LangChain?
No. LangGraph focuses on agent flow control; LangChain handles utilities. They complement each other.
Q2: Do I need a paid vector DB?
No. Start with Chroma or Weaviate’s free tier.
Q3: Can I add web search?
Yes. Tools like Tavily/Serper can act as fallback nodes.