Skip to content
All projects

GraphRAG over Legal Matters

A graph-aware retrieval pipeline that indexes 10K+ legal documents and answers case questions in roughly two seconds.

  • 10K+ docs indexed
  • +35% retrieval precision
  • −40% latency
PythonLiteLLMCosmos DBEmbeddings & vector searchFastAPIAzure

Built at CloudLex

The problem

Plain vector search over a firm’s documents missed cross-document relationships and returned noisy context, hurting answer quality on real legal matters.

What I built

  • Built a GraphRAG pipeline that layers entity/relationship structure over embeddings, so retrieval follows connections between documents — not just nearest-neighbour text.
  • Routed all model calls through LiteLLM for provider-agnostic inference and centralized cost/observability.
  • Tuned chunking and retrieval to balance precision against answer latency.

How it fits together

DocumentsIngestion & chunkingEmbeddings + graph index (Cosmos DB)Query → graph-aware retrievalLLM (via LiteLLM)Grounded answer (~2s)

Key technical decisions

A graph layer over pure vectors

Adding relationship structure on top of embeddings improved retrieval precision ~35% versus flat vector search on the same corpus.

LiteLLM as the model gateway

A single proxy for all providers made it trivial to switch models, enforce limits, and track token cost per feature.

Outcomes

  • +35% retrieval precision and −40% latency versus flat vector search
  • ~2-second search-to-answer on real matters
  • 10K+ documents indexed across tenants

Proprietary — source not public.

Want to talk through any of this?

jntkhandebharad@gmail.com