• VLLM, TRTLLM, Pytorch • Inference Optimization for latency/throughput/cost • Speculative decoding, page attention, continuous batching, Quantization etc
Experience
Education
Beijing Institute of Technology
Bachelor's degree
Tsinghua University
Master's degree
University of Massachusetts Amherst