I’m a Software Engineer specializing in backend infrastructure, cloud-native systems, and applied AI in observability, with hands-on experience in multi-tenant, production-grade deployments.
Experience
2024 — Now
2024 — Now
New York, United States
Owned the design and evolution of a production LLM platform, enabling multi-agent reasoning across infrastructure, logs, and metrics to support observability workflows and incident diagnostics at enterprise scale.
Built a cross-signal RCA capability that correlates alerts, application logs, Kubernetes telemetry, and
upstream/downstream service behavior to isolate a small set of actionable root causes during large-scale incidents.
Introduced deterministic, stateful agent workflows using LangGraph and LangChain, evolving critical paths from controller-driven LLM behavior to explicit execution graphs for improved correctness, debuggability, and reliability.
Delivered MCP (Model Context Protocol), a production-grade execution platform for the assistant, enabling IDE-integrated LLM operations and supporting deterministic multi-step agent workflows for safe, reliable, and autonomous AI operations.
Led end-to-end multi-region expansion as technical lead, coordinating infrastructure, security, CI/CD, and validation to deliver zero-downtime launches with consistent low-latency performance worldwide.
Designed and deployed Retrieval-Augmented Generation (RAG) pipelines over logs, metrics, and internal documentation, grounding LLM responses in live telemetry for accurate operational insights.
Designed production rollout safeguards using feature flags, controlled rollbacks, and scoped deployments, enabling safe iteration on high-risk AI features without impacting existing customers.
Architected RBAC-aware dynamic prompt composition, enforcing permission boundaries and runtime capability checks to prevent unauthorized tool execution and improve assistant reliability.
Designed a standardized audit logging framework that established consistent logging practices, making failures easier to understand and significantly simplifying debugging and incident analysis for engineers.
2023 — 2024
2023 — 2024
San Jose, California, United States
Developed an edge computing platform optimizing distributed ML training. (Funded by Intel Corporation).
Transformed model execution from single GPU to a distributed ML hosting platform using Kubernetes, boosting scalability by 40% and reducing deployment complexities by 30%.
Enhanced backend infrastructure using Golang and Docker for efficient automatic deployment, cutting debugging time by 50% and improving reliability by 25%, streamlining model deployment.
Unified Prometheus, Grafana, Loki into a Next.js dashboard, boosting cluster management efficiency by 40% and enabling a 50% increase in model usage analytics for informed performance enhancements.
2023 — 2024
San Jose, California, United States
2020 — 2022
2020 — 2022
Hyderabad, Telangana, India
Implemented a Core Java and PL/SQL-driven monolithic application, responsible for data processing, claims management and report generation.
Revamped monolithic codebase to Spring Boot microservice architecture, cutting deployment time by 50% and maintaining 99.9% uptime on Kubernetes over 12 months without unplanned outages.
Engineered a robust data pipeline on Databricks with Spark and Scala, resulting in efficient data storage through the creation of Delta tables.
Automated code deployment by building Jenkins CI/CD pipelines that deployed latest builds, carried out tests, and logged errors encountered; shortened release cycle by 60%.
2019 — 2019
2019 — 2019
Hyderabad Area, India
Contributed to the development of an educational platform emphasizing analytical insights and student growth.
Designed and optimized REST API and GraphQL APIs to enhance data processing efficiency by 20%, catering to diverse use cases and ensuring scalability.
Utilized multi-threading to expedite data processing and MongoDB storage, achieving a 30% faster assessment generation and swift report delivery.
Utilized React.js to create interactive visualizations from persisted data, providing students actionable insights into their learning progress and assessment performance, leading to a 60% improvement in user engagement.
Education
San José State University
Master's degree
Delhi College of Engineering
Bachelor of Technology (B.Tech.)
Ch. Baldev Singh Model School
Class 12
Bal Bhavan Public School