Working on content delivery, personalization, and next-gen viewing experiences on Prime Video. Focussed on Ownership, Delivering results and Collaboration.
• Led Prime Video - Alexa service meeting 300 ms latency SLA and generating conversational voice text streaming LLM summarization processing 100K queries daily.
• Architected distributed microservices for ML inference, achieving 99.9% availability under high-concurrency production traffic.
• Optimized large-scale PyTorch inference pipelines, reducing P99 latency 3x through parallelization and memory-efficient caching.
• Designed feature gating framework to support controlled rollouts, A/B testing, repeatable evaluation of 10+ AI features
• Established REST API contracts, unit testing, and CI validation pipelines reducing regression defects 30% before release.
• Identified system bottlenecks and improved performance through scalable caching and service orchestration.