Platform · LLMOps · DevOps
LLMOps & Deployment Platform
The delivery backbone for the GenAI products — multi-stage Azure pipelines, the right compute for each workload, fast rollbacks, and cost/latency observability.
- −22% cloud cost
- Zero-downtime rollouts
- 99.9% uptime
Built at CloudLex
// problem
The problem
Several GenAI services needed to ship to multiple environments many times a day without downtime, and LLM spend needed to be visible and controllable.
// approach
What I built
- Parameterized multi-stage pipelines build and deploy each component independently — HTTP APIs to Container Apps, batch/indexing jobs to Container App Jobs, lightweight triggers to Functions.
- Every build is immutably tagged, so a rollback is a one-command revision switch.
- Token usage and latency are tracked per feature via the model gateway and Application Insights/KQL, keeping cost in check.
// architecture
How it fits together
// decisions
Key technical decisions
Right compute primitive per workload
HTTP APIs on Container Apps (autoscale on concurrency), batch jobs on Container App Jobs (scale to zero between runs), short triggers on Functions — optimizing both cost and cold-start.
Immutable tags + revisions
Deploying by immutable build id means every release is a new revision, and rollback is instant and safe.
// outcomes
Outcomes
- ~22% lower cloud cost
- Zero-downtime, component-by-component rollouts
- 99.9% uptime maintained
Proprietary — source not public.
Want to talk through any of this?
jntkhandebharad@gmail.com