RAGOps: Production-Grade RAG Platform
Personal Project
Problem
Naive RAG systems built on embedding-only retrieval suffer from poor recall on out-of-distribution queries, context pollution from irrelevant chunks, and zero visibility into why a given answer was produced. There was no systematic way to measure or improve retrieval quality over time.
Constraints
- Heterogeneous document formats (PDF, markdown, HTML) required a unified ingestion pipeline
- Query latency budget: end-to-end response under 3 seconds including reranking
- Cost per query had to remain viable for self-hosted, single-tenant use
- Evaluation required ground-truth labels — 150 QA pairs curated manually
- No managed vector database; pgvector on PostgreSQL to keep the stack minimal
Approach
Replaced single-stage dense retrieval with a three-stage pipeline: (1) dual retrieval combining pgvector ANN search with BM25-style lexical matching, (2) score fusion to merge candidate lists, and (3) a cross-encoder reranker applied to the top-k candidates before passing context to the LLM. Chunking strategy was switched from fixed-size to semantic boundaries to improve chunk coherence. A fallback gate rejects low-confidence queries rather than hallucinating. Evaluation was embedded into the development loop — every pipeline change was measured against the 150-query benchmark before merging.
Architecture

Metrics
| Metric | Baseline | Achieved |
|---|---|---|
| Recall@10 | ~58% | ~81% |
| Answer precision (manual) | 62% | 84% |
| Irrelevant context rate | 31% | 11% |
| Avg query latency | 1.1 s | 2.4 s (reranker added) |
| Benchmark queries | 0 | 150 QA pairs |
Product Impact
RAGOps functions as a self-hostable knowledge-base Q&A system for domain-specific document corpora. The observability dashboard lets an operator debug retrieval failures without re-running experiments manually. The evaluation framework enables confident iteration — any retrieval change is quantified before deployment, treating the LLM application as infrastructure rather than a prototype.