# Ye Tang

> SDE at Amazon

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/ye

Results-driven Software Engineer with expertise in generative AI research and hands-on experience building scalable systems for conversational AI. Demonstrated success in architecting and implementing critical infrastructure for large language model (LLM) deployment, cross-region service expansion, and intelligent retrieval systems. At Amazon Rufus, I develop AI infrastructure that powers a shopping assistant used by millions daily. Passionate about bridging research and engineering to turn cutting-edge models into real-world applications.

## Work Experience
### Software Engineer @ Amazon
Jan 2020 – Present | San Francisco Bay Area
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 𝐬𝐡𝐨𝐩𝐩𝐢𝐧𝐠 𝐀𝐬𝐬𝐢𝐬𝐭𝐚𝐧𝐭
## AI Model Routing & Thinking Mode System  
- Built a dynamic model routing and post-processing system for Amazon's latest LLM model, enabling adaptive “Agent Thinking” based on query complexity.  
- Helped improved helpfulness metrics from 26.8% to 37%+ by supporting flexible transitions between thinking, planning, and acting modes.  
- Helped shift from rigid architectures to a unified ReAct framework with PPO-based reinforcement learning, reducing reliance on IFT/DPO.  

## RAG with Attribution & Scalable Web Search  
- Delivered a model-agnostic RAG system with content attribution, supporting integration with 60+ premium publishers.  
- Proposed and implemented distributed post-processing architecture to simplify scaling and system complexity.  
- Helped enabled cross-LLM web search with transparent and high quaility third-party sourcing 

## Generative AI system Internationalization  
- Internationalized core retrieval service to India and UK within 3 months of US launch.  
- Automated cross-region setup, reducing infrastructure build time by ~70%.  
- Built region-aware infrastructure processing 20M+ queries in India and 5.8M in the UK with consistent performance.  

## LLM Hardware Optimization
- Migrated workloads from p4de to g5 and p3 GPU instances, cutting infrastructure costs by ~90%.  
- Localized GPU infrastructure to eliminate cross-region latency and improve offline indexing throughput.  
- Enabled international launch by resolving cross-region issues, freeing up shared resources for critical tasks like model training.

### Cross-Service LLM Caching Framework  
- Designed a cache-aside system across Orchestrator, Retrieval, Model Server, Cache to support post-purchase Q&A.  
- Built centralized signal routing logic and content-aware filtering to improve modularity and response quality.  
- Improved post-purchase coverage by 5%, with full monitoring and alerting for reliability at scale.

### SDE Intern @ Amazon
Jan 2019 – Jan 2019
- Designed and implemented a new feature with Java, SQL and Scala to make Data Ingestion Pipeline support complicated events
- Developed an event validator for customers to validate event schema and SQL expression before data on-boarding
- Solved complicated events support request by more than 10 Alexa shopping data producer teams

### Visiting Researcher @ University of Michigan
Jan 2017 – Jan 2018 | fMRI Lab, Ann Arbor
-Extracted physical features from complex images with local normalization
-Proposed complex-value convolutional neural network to determine the frequency of blur in magnetic resonance image
-Implemented generative adversarial networks to generate field map for MR image deblurring and got 6% improvement on accuracy

### Student Research Assistant @ Microsoft Research Asia Alumni
Jan 2016 – Jan 2017 | Beijing City, China
-Introduced Generative Adversarial Networks with new loss formulation for unsupervised visual representation learning
-Implemented a pipeline with parallel computing combining object segmentation and visual representation learning for medical image classication and improved F-score by 5% in clinical disease prediction(Pytorch)
-Proposed machine learning methods for medical image classication, object detection and segmentation
-Reviewed over 100 recent papers of machine learning in medical image analysis and reproduced the core methods for comparison and discussion (Pytorch, Caffe, Tensorflow)


## Education
### Master's degree in Computer Science
University of Southern California
Jan 2018 – Jan 2020

### Bachelor's degree
Beihang University
Jan 2014 – Jan 2018


## Contact & Social
- LinkedIn: https://linkedin.com/in/ye-tang-3407a9141

---
Source: https://flows.cv/ye
JSON Resume: https://flows.cv/ye/resume.json
Last updated: 2026-03-22