Skip to content

// learn · rag

RAG in production

Retrieval-augmented generation is the most useful pattern in applied AI: it grounds a model in your data so it answers from facts, not vibes. This is the path to building one that survives real users.

New to the term? Start with the definition of RAG in the glossary.

01The production pipeline
01
Ingest

Pull documents from your sources.

02
Chunk

Split with structure in mind, not fixed character counts.

03
Embed

Turn chunks into vectors.

04
Store

A vector database (pgvector works great) with good metadata.

05
Retrieve

Vector search for the top-k relevant chunks.

06
Rerank

A second pass that keeps only the best few.

07
Ground

Build the prompt from retrieved context.

08
Cite

Always return sources with the answer.

A demo stops at step 8. Production adds evals, observability, retries and cost budgets, and prompt-injection defense. Walk the full build in how to build a production RAG app.

Frequently asked questions

Is RAG still relevant with long context windows?

Yes. Even with large context windows, RAG is cheaper, faster, and more accurate for large or changing corpora — you retrieve only what's relevant instead of paying to stuff everything into every prompt, and you get citations.

RAG or fine-tuning?

Use RAG for facts that are fresh, private, or changing; use fine-tuning for consistent style, format, or task behavior. Many production systems combine both. See our RAG vs fine-tuning comparison.

Which vector database should I use?

Start with pgvector if you already run Postgres — it's simple and production-capable. Reach for a dedicated store (Pinecone, Qdrant, Weaviate) when scale, filtering, or hybrid search demands it.

How do I evaluate a RAG system?

Build a labeled test set and measure retrieval quality (did the right chunk get fetched?) separately from answer quality. Debug retrieval first — most wrong answers come from bad retrieval, not the model.

Production AI Notes

One practical AI engineering email each week

One concept, one architecture, one project idea, and one interview question — written for developers who want to build and ship real AI systems.

No spam. Unsubscribe anytime.