Agentic AI · Multi-tenant Platform

Agentic AI Platform for Legal Workflows

A multi-tenant platform where every legal matter has an AI assistant — a supervisor agent delegates to specialist sub-agents and offloads long-running automation to a worker tier.

Supervisor + sub-agents
Strict per-tenant isolation
Queue-backed workers

PythonFastAPIMulti-agent orchestrationAzure Service BusPostgreSQL + pgvectorLiteLLMRedisDockerBicepAzure DevOps

Built at CloudLex

// problem

The problem

Legal teams need an assistant that can both answer questions about a specific matter and carry out long, multi-step automation (drafting, document ingestion) — without one slow task blocking the live chat, and without ever leaking data between firms.

// approach

What I built

A conversation supervisor agent runs per request and routes to specialist sub-agents (case, medical, workflow) for grounded answers, returning within seconds.
Long-running work is handed to a separate worker tier over a message queue, so the chat path stays fast and the heavy path scales independently.
Retrieval is grounded per matter using embeddings stored in Postgres/pgvector, with all model calls routed through a gateway for provider-agnostic inference.

// architecture

How it fits together

User chat (SSE)FastAPI API tierSupervisor agentSpecialist sub-agentsQueue (Service Bus)Worker tier — drafting / ingestionPostgres + pgvector / BlobResult posted back to chat

Fast conversation tier in front, queue-decoupled worker tier behind.

// decisions

Key technical decisions

Tenant isolation at the auth boundary

The firm identifier is fixed at authentication and baked into every tool call — never an LLM-controllable parameter — so cross-firm data leakage is structurally impossible rather than prompt-dependent.

No agent-to-agent calls

Sub-agents only ever talk to their supervisor. That keeps the control flow auditable and prevents runaway agent chains.

Long work always goes through the queue

A sub-agent never blocks on a multi-minute job; the only path to heavy work is an explicit enqueue, so the request tier always returns quickly.

Separate ORM models and API contracts

Database models are never returned directly from routes — typed response objects keep the API contract stable and safe to evolve.

// outcomes

Outcomes

Multi-tenant by construction with strict per-firm isolation
Conversation tier and worker tier scale independently
Identical code path runs locally (Docker Compose) and on Azure — no environment branching

Proprietary — source not public.

Want to talk through any of this?

jntkhandebharad@gmail.com