RAG · Retrieval
GraphRAG over Legal Matters
A graph-aware retrieval pipeline that indexes 10K+ legal documents and answers case questions in roughly two seconds.
- 10K+ docs indexed
- +35% retrieval precision
- −40% latency
PythonLiteLLMCosmos DBEmbeddings & vector searchFastAPIAzure
Built at CloudLex
// problem
The problem
Plain vector search over a firm’s documents missed cross-document relationships and returned noisy context, hurting answer quality on real legal matters.
// approach
What I built
- Built a GraphRAG pipeline that layers entity/relationship structure over embeddings, so retrieval follows connections between documents — not just nearest-neighbour text.
- Routed all model calls through LiteLLM for provider-agnostic inference and centralized cost/observability.
- Tuned chunking and retrieval to balance precision against answer latency.
// architecture
How it fits together
DocumentsIngestion & chunkingEmbeddings + graph index (Cosmos DB)Query → graph-aware retrievalLLM (via LiteLLM)Grounded answer (~2s)
// decisions
Key technical decisions
A graph layer over pure vectors
Adding relationship structure on top of embeddings improved retrieval precision ~35% versus flat vector search on the same corpus.
LiteLLM as the model gateway
A single proxy for all providers made it trivial to switch models, enforce limits, and track token cost per feature.
// outcomes
Outcomes
- +35% retrieval precision and −40% latency versus flat vector search
- ~2-second search-to-answer on real matters
- 10K+ documents indexed across tenants
Proprietary — source not public.
Want to talk through any of this?
jntkhandebharad@gmail.com