Experience

2020 — Now

San Jose, California, United States

Staff Software Engineer – AI Infrastructure (LLM Inference & RL Systems)

Led the architecture and optimization of large-scale LLM inference systems, focusing on long-context serving, speculative decoding, and prefill/decode disaggregation to improve latency, throughput, and cost efficiency.

Built a high-performance LLM serving stack with deep integration of SGLang, TensorRT-LLM, and custom GPU optimizations (KV-cache management, batching, scheduling), enabling efficient serving of large models under production constraints.

Designed advanced request routing and scheduling strategies for search-driven multi-request workloads, ensuring high GPU utilization and strict latency SLOs.

Developed and scaled RL training infrastructure (Verl, AReaL), supporting SFT, RLHF, and agentic RL workloads, improving training efficiency and system scalability.

Built an end-to-end e-commerce search system from scratch, including recall, ranking, indexing, and data feedback loops.

MEGVII旷视Software Engineer

2018 — 2020

Shanghai, China

Responsible for building a storage system for small data storage. Build core systems

including data storage, data consistency, system reliability, etc.

Responsible for building a search engine. Build the core systems including high

performance computing, distributed storage, system reliability, etc.

Education

Shanghai Jiao Tong University

Master

Shanghai Jiao Tong University

Experience

Education

Master

Bachelor