# Ye Tang > SDE at Amazon Location: San Francisco Bay Area, United States Profile: https://flows.cv/ye Results-driven Software Engineer with expertise in generative AI research and hands-on experience building scalable systems for conversational AI. Demonstrated success in architecting and implementing critical infrastructure for large language model (LLM) deployment, cross-region service expansion, and intelligent retrieval systems. At Amazon Rufus, I develop AI infrastructure that powers a shopping assistant used by millions daily. Passionate about bridging research and engineering to turn cutting-edge models into real-world applications. ## Work Experience ### Software Engineer @ Amazon Jan 2020 – Present | San Francisco Bay Area π†πžπ§πžπ«πšπ­π’π―πž π€πˆ 𝐬𝐑𝐨𝐩𝐩𝐒𝐧𝐠 π€π¬π¬π’π¬π­πšπ§π­ ## AI Model Routing & Thinking Mode System - Built a dynamic model routing and post-processing system for Amazon's latest LLM model, enabling adaptive β€œAgent Thinking” based on query complexity. - Helped improved helpfulness metrics from 26.8% to 37%+ by supporting flexible transitions between thinking, planning, and acting modes. - Helped shift from rigid architectures to a unified ReAct framework with PPO-based reinforcement learning, reducing reliance on IFT/DPO. ## RAG with Attribution & Scalable Web Search - Delivered a model-agnostic RAG system with content attribution, supporting integration with 60+ premium publishers. - Proposed and implemented distributed post-processing architecture to simplify scaling and system complexity. - Helped enabled cross-LLM web search with transparent and high quaility third-party sourcing ## Generative AI system Internationalization - Internationalized core retrieval service to India and UK within 3 months of US launch. - Automated cross-region setup, reducing infrastructure build time by ~70%. - Built region-aware infrastructure processing 20M+ queries in India and 5.8M in the UK with consistent performance. ## LLM Hardware Optimization - Migrated workloads from p4de to g5 and p3 GPU instances, cutting infrastructure costs by ~90%. - Localized GPU infrastructure to eliminate cross-region latency and improve offline indexing throughput. - Enabled international launch by resolving cross-region issues, freeing up shared resources for critical tasks like model training. ### Cross-Service LLM Caching Framework - Designed a cache-aside system across Orchestrator, Retrieval, Model Server, Cache to support post-purchase Q&A. - Built centralized signal routing logic and content-aware filtering to improve modularity and response quality. - Improved post-purchase coverage by 5%, with full monitoring and alerting for reliability at scale. ### SDE Intern @ Amazon Jan 2019 – Jan 2019 - Designed and implemented a new feature with Java, SQL and Scala to make Data Ingestion Pipeline support complicated events - Developed an event validator for customers to validate event schema and SQL expression before data on-boarding - Solved complicated events support request by more than 10 Alexa shopping data producer teams ### Visiting Researcher @ University of Michigan Jan 2017 – Jan 2018 | fMRI Lab, Ann Arbor -Extracted physical features from complex images with local normalization -Proposed complex-value convolutional neural network to determine the frequency of blur in magnetic resonance image -Implemented generative adversarial networks to generate field map for MR image deblurring and got 6% improvement on accuracy ### Student Research Assistant @ Microsoft Research Asia Alumni Jan 2016 – Jan 2017 | Beijing City, China -Introduced Generative Adversarial Networks with new loss formulation for unsupervised visual representation learning -Implemented a pipeline with parallel computing combining object segmentation and visual representation learning for medical image classi cation and improved F-score by 5% in clinical disease prediction(Pytorch) -Proposed machine learning methods for medical image classi cation, object detection and segmentation -Reviewed over 100 recent papers of machine learning in medical image analysis and reproduced the core methods for comparison and discussion (Pytorch, Caff e, Tensorflow) ## Education ### Master's degree in Computer Science University of Southern California Jan 2018 – Jan 2020 ### Bachelor's degree Beihang University Jan 2014 – Jan 2018 ## Contact & Social - LinkedIn: https://linkedin.com/in/ye-tang-3407a9141 --- Source: https://flows.cv/ye JSON Resume: https://flows.cv/ye/resume.json Last updated: 2026-03-22