Experience

2022 — Now

*Developing and contributing to TikTok’s global scale distributed GPU-serving infrastructure for recommendation systems, improving throughput, reducing latency and enhancing reliability.

*Designed and contributed to the key GPU model serving optimization solutions, including MLIR-based automatic operator fusion compiler, GPU kernel development, resulting in tens of millions of dollars in cost savings annually.

*Collaborated with Rank, Ads, E-commerce and many other business teams in model inference optimization, enabling the successful launch of critical models and algorithms.

*Enhanced the serving infrastructure to enable business teams to scale up their models strategically, driving significant impacts and results.

*Introduced and deployed the in-house developed AI hardware accelerator in non-China region; developed further optimizations.

*Introduced vLLM based LLM serving to meet in-house business needs.

*Mentored new engineers.

DinoplusAICofounder

2017 — 2017

San Francisco Bay Area

DinoplusAI designs AI processors and the software to run them in data centers. Our unique approach optimizes for inference with the focus on performance, power efficiency, and ease of use; and at the same time our approach enables cost-effective training.

Cisco / Insieme NetworksSenior Technical Lead

2013 — 2017

San Francisco Bay Area

Lab126software development engineer

2011 — 2013

CapsovisionSenior Manager of Software

2008 — 2011

Education

Shanghai Jiao Tong University

Experience+3

Education

Experience