AI Platform + LLM Infrastructure
•Shipped 3 production LLM agents (context aware chat, account research agent, SDR coaching agent) using RAG, tools, safety layers, and session/state management. Currently, all 3 are core product differentiators and highlighted in 2025 sales cycles.
•Own the company wide LLM proxy (LiteLLM), and observability tool (Langfuse) including authentication, cost tracking, observability, and alerting that caught and prevented multiple 5-figure cost spikes.
•Implemented agent evaluation framework (LLM as judge, synthetic data regression tests, and golden dataset testing). Now required steps for major agent updates.
ML Ops
•Owned end-to-end ML operations: Delta Lake + Spark data platform, BERT/tree-model training pipelines, MLflow registry, feature store patterns, and batch/real-time inference on Kubernetes.
•Automated dataset creation, model retraining, drift monitoring, and deployment workflows using Spark, Databricks, MLflow, and Kubernetes.
Distributed and Event Driven Systems
•Scaled Kafka pipelines for millions of emails/day, powering sentiment analysis, signature extraction, and OOO detection.
•Designed high-throughput, event-driven services processing tens of millions of events/day for real-time deal + buyer models.
•Enforced strict multi-tenancy across all data + ML systems
Cross-team & Org Leadership
•Uniquely bridging AI research, data engineering, backend, platform, unblocking cross-team initiatives.
•Drove architecture for AI features used by the entire company.
•Mentored engineers on Spark, distributed systems, and ML workflows, and Agents.