Experience
2020 — Now
2020 — Now
San Jose, California, United States
Staff Software Engineer – AI Infrastructure (LLM Inference & RL Systems)
Led the architecture and optimization of large-scale LLM inference systems, focusing on long-context serving, speculative decoding, and prefill/decode disaggregation to improve latency, throughput, and cost efficiency.
Built a high-performance LLM serving stack with deep integration of SGLang, TensorRT-LLM, and custom GPU optimizations (KV-cache management, batching, scheduling), enabling efficient serving of large models under production constraints.
Designed advanced request routing and scheduling strategies for search-driven multi-request workloads, ensuring high GPU utilization and strict latency SLOs.
Developed and scaled RL training infrastructure (Verl, AReaL), supporting SFT, RLHF, and agentic RL workloads, improving training efficiency and system scalability.
Built an end-to-end e-commerce search system from scratch, including recall, ranking, indexing, and data feedback loops.
2018 — 2020
2018 — 2020
Shanghai, China
Responsible for building a storage system for small data storage. Build core systems
including data storage, data consistency, system reliability, etc.
Responsible for building a search engine. Build the core systems including high
performance computing, distributed storage, system reliability, etc.
Education
Shanghai Jiao Tong University
Master
Shanghai Jiao Tong University