Results-driven Software Engineer with expertise in generative AI research and hands-on experience building scalable systems for conversational AI.
Experience
2020 β Now
San Francisco Bay Area
πππ§ππ«πππ’π―π ππ π¬π‘π¨π©π©π’π§π ππ¬π¬π’π¬πππ§π
## AI Model Routing & Thinking Mode System
Built a dynamic model routing and post-processing system for Amazon's latest LLM model, enabling adaptive βAgent Thinkingβ based on query complexity.
Helped improved helpfulness metrics from 26.8% to 37%+ by supporting flexible transitions between thinking, planning, and acting modes.
Helped shift from rigid architectures to a unified ReAct framework with PPO-based reinforcement learning, reducing reliance on IFT/DPO.
## RAG with Attribution & Scalable Web Search
Delivered a model-agnostic RAG system with content attribution, supporting integration with 60+ premium publishers.
Proposed and implemented distributed post-processing architecture to simplify scaling and system complexity.
Helped enabled cross-LLM web search with transparent and high quaility third-party sourcing
## Generative AI system Internationalization
Internationalized core retrieval service to India and UK within 3 months of US launch.
Automated cross-region setup, reducing infrastructure build time by ~70%.
Built region-aware infrastructure processing 20M+ queries in India and 5.8M in the UK with consistent performance.
## LLM Hardware Optimization
Migrated workloads from p4de to g5 and p3 GPU instances, cutting infrastructure costs by ~90%.
Localized GPU infrastructure to eliminate cross-region latency and improve offline indexing throughput.
Enabled international launch by resolving cross-region issues, freeing up shared resources for critical tasks like model training.
### Cross-Service LLM Caching Framework
Designed a cache-aside system across Orchestrator, Retrieval, Model Server, Cache to support post-purchase Q&A.
Built centralized signal routing logic and content-aware filtering to improve modularity and response quality.
Improved post-purchase coverage by 5%, with full monitoring and alerting for reliability at scale.
2019 β 2019
Designed and implemented a new feature with Java, SQL and Scala to make Data Ingestion Pipeline support complicated events
Developed an event validator for customers to validate event schema and SQL expression before data on-boarding
Solved complicated events support request by more than 10 Alexa shopping data producer teams
fMRI Lab, Ann Arbor
Extracted physical features from complex images with local normalization
Proposed complex-value convolutional neural network to determine the frequency of blur in magnetic resonance image
Implemented generative adversarial networks to generate field map for MR image deblurring and got 6% improvement on accuracy
Beijing City, China
Introduced Generative Adversarial Networks with new loss formulation for unsupervised visual representation learning
Implemented a pipeline with parallel computing combining object segmentation and visual representation learning for medical image classication and improved F-score by 5% in clinical disease prediction(Pytorch)
Proposed machine learning methods for medical image classication, object detection and segmentation
Reviewed over 100 recent papers of machine learning in medical image analysis and reproduced the core methods for comparison and discussion (Pytorch, Caffe, Tensorflow)
Education
2018 β 2020
University of Southern California
Master's degree
2018 β 2020
2014 β 2018
Beihang University
Bachelor's degree
2014 β 2018