Results-driven Software Engineer with expertise in generative AI research and hands-on experience building scalable systems for conversational AI.

Experience

2020 — NowAmazonSoftware Engineer

2020 — Now

San Francisco Bay Area

𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 𝐬𝐡𝐨𝐩𝐩𝐢𝐧𝐠 𝐀𝐬𝐬𝐢𝐬𝐭𝐚𝐧𝐭

## AI Model Routing & Thinking Mode System

Built a dynamic model routing and post-processing system for Amazon's latest LLM model, enabling adaptive “Agent Thinking” based on query complexity.

Helped improved helpfulness metrics from 26.8% to 37%+ by supporting flexible transitions between thinking, planning, and acting modes.

Helped shift from rigid architectures to a unified ReAct framework with PPO-based reinforcement learning, reducing reliance on IFT/DPO.

## RAG with Attribution & Scalable Web Search

Delivered a model-agnostic RAG system with content attribution, supporting integration with 60+ premium publishers.

Proposed and implemented distributed post-processing architecture to simplify scaling and system complexity.

Helped enabled cross-LLM web search with transparent and high quaility third-party sourcing

## Generative AI system Internationalization

Internationalized core retrieval service to India and UK within 3 months of US launch.

Automated cross-region setup, reducing infrastructure build time by ~70%.

Built region-aware infrastructure processing 20M+ queries in India and 5.8M in the UK with consistent performance.

## LLM Hardware Optimization

Migrated workloads from p4de to g5 and p3 GPU instances, cutting infrastructure costs by ~90%.

Localized GPU infrastructure to eliminate cross-region latency and improve offline indexing throughput.

Enabled international launch by resolving cross-region issues, freeing up shared resources for critical tasks like model training.

### Cross-Service LLM Caching Framework

Designed a cache-aside system across Orchestrator, Retrieval, Model Server, Cache to support post-purchase Q&A.

Built centralized signal routing logic and content-aware filtering to improve modularity and response quality.

Improved post-purchase coverage by 5%, with full monitoring and alerting for reliability at scale.

2019 — 2019AmazonSDE Intern

2019 — 2019

Designed and implemented a new feature with Java, SQL and Scala to make Data Ingestion Pipeline support complicated events

Developed an event validator for customers to validate event schema and SQL expression before data on-boarding

Solved complicated events support request by more than 10 Alexa shopping data producer teams

2017 — 2018University of MichiganVisiting Researcher

2017 — 2018

fMRI Lab, Ann Arbor

Extracted physical features from complex images with local normalization

Proposed complex-value convolutional neural network to determine the frequency of blur in magnetic resonance image

Implemented generative adversarial networks to generate field map for MR image deblurring and got 6% improvement on accuracy

2016 — 2017Microsoft Research Asia AlumniStudent Research Assistant

2016 — 2017

Beijing City, China

Introduced Generative Adversarial Networks with new loss formulation for unsupervised visual representation learning

Implemented a pipeline with parallel computing combining object segmentation and visual representation learning for medical image classication and improved F-score by 5% in clinical disease prediction(Pytorch)

Proposed machine learning methods for medical image classication, object detection and segmentation

Reviewed over 100 recent papers of machine learning in medical image analysis and reproduced the core methods for comparison and discussion (Pytorch, Caffe, Tensorflow)

Education

2018 — 2020

University of Southern California

Master's degree

2018 — 2020

2014 — 2018

Beihang University

Bachelor's degree

2014 — 2018