RAG · Retrieval

GraphRAG over Legal Matters

A graph-aware retrieval pipeline that indexes 10K+ legal documents and answers case questions in roughly two seconds.

10K+ docs indexed
+35% retrieval precision
−40% latency

PythonLiteLLMCosmos DBEmbeddings & vector searchFastAPIAzure

Built at CloudLex

// problem

The problem

Plain vector search over a firm’s documents missed cross-document relationships and returned noisy context, hurting answer quality on real legal matters.

// approach

What I built

Built a GraphRAG pipeline that layers entity/relationship structure over embeddings, so retrieval follows connections between documents — not just nearest-neighbour text.
Routed all model calls through LiteLLM for provider-agnostic inference and centralized cost/observability.
Tuned chunking and retrieval to balance precision against answer latency.

// architecture

How it fits together

DocumentsIngestion & chunkingEmbeddings + graph index (Cosmos DB)Query → graph-aware retrievalLLM (via LiteLLM)Grounded answer (~2s)

// decisions

Key technical decisions

A graph layer over pure vectors

Adding relationship structure on top of embeddings improved retrieval precision ~35% versus flat vector search on the same corpus.

LiteLLM as the model gateway

A single proxy for all providers made it trivial to switch models, enforce limits, and track token cost per feature.

// outcomes

Outcomes

+35% retrieval precision and −40% latency versus flat vector search
~2-second search-to-answer on real matters
10K+ documents indexed across tenants

Proprietary — source not public.

Want to talk through any of this?

jntkhandebharad@gmail.com