RAG

From Naive RAG to Agentic RAG: Production Lessons

March 18, 2026 10 min read Ahmed Raoofuddin

At OffplanProperties.ai, I shipped a RAG system that indexed 300,000+ long-form real estate pages. The v1 was naive RAG: embed, top-k, stuff into context, generate. It worked in demos and failed in production.

Over six months, I rebuilt it as Agentic RAG, and the numbers speak for themselves: retrieval precision from 0.62 to 0.91, hallucination rate from 11% to under 2%, p95 latency still under 800ms.

Why Naive RAG Fails in Production

Naive RAG has three silent killers:

Embedding drift, user queries and document chunks live in different vocabularies ("2-bed apartment near metro" vs. legal descriptions full of "ensuite" and "RERA")
Chunking blindness, important context gets split across chunks and never retrieved together
No recovery, when retrieval fails, the LLM hallucinates confidently instead of asking for clarification

Step 1: Hybrid Search (BM25 + Vector)

My first upgrade was combining BM25 keyword search with dense vector search using reciprocal rank fusion. This alone moved precision from 0.62 to 0.74.

// RRF fusion
for doc in (bm25_results | vector_results):
    score[doc] += 1 / (60 + rank[doc])

Step 2: Cross-Encoder Rerank

After hybrid search gives you the top-50, rerank with a cross-encoder. Cross-encoders are slow at scale but precise, and you're only running them on 50 candidates.

I used BAAI/bge-reranker-large. Added 120ms to p95 but moved precision from 0.74 to 0.85.

Step 3: Query Transformation

Here's where it gets agentic. Before retrieval, a small LLM rewrites the user's query:

Expansion: "cheap 2BR Marina" → "affordable two-bedroom apartments in Dubai Marina under AED 1.5M with sea view"
Decomposition: "which new projects have pools and gyms near metro" → 3 sub-queries
HyDE: hypothetical document generation to bridge the query/document vocabulary gap

Step 4: The Agentic Loop

The retrieval engine itself became an agent. Given a query, it:

Decides whether to search, reformulate, or answer directly
Runs retrieval, scores confidence
If confidence is low, reformulates and retries (max 3 hops)
If still low, asks the user a clarifying question instead of hallucinating

This is what people mean by "Agentic RAG." It's not a buzzword, it's the difference between a system that confidently wrong and one that knows when to stop.

Step 5: Evaluation Harness (RAGAS + LangSmith)

None of this matters if you can't measure it. I built an eval harness with RAGAS for retrieval metrics (context precision, recall, faithfulness) and LangSmith for end-to-end traces with human feedback.

Every PR runs 200 golden questions. If faithfulness drops below 0.9 or latency exceeds 1s, the build fails. Saved me from shipping three silent regressions in the first month.

Takeaways

Naive RAG is a demo. Agentic RAG is production. The gap between them is measurement.

If you're building RAG in 2026, start with hybrid search + rerank + eval harness on day one. Don't ship without them.