*Developing and contributing to TikTok’s global scale distributed GPU-serving infrastructure for recommendation systems, improving throughput, reducing latency and enhancing reliability.
*Designed and contributed to the key GPU model serving optimization solutions, including MLIR-based automatic operator fusion compiler, GPU kernel development, resulting in tens of millions of dollars in cost savings annually.
*Collaborated with Rank, Ads, E-commerce and many other business teams in model inference optimization, enabling the successful launch of critical models and algorithms.
*Enhanced the serving infrastructure to enable business teams to scale up their models strategically, driving significant impacts and results.
*Introduced and deployed the in-house developed AI hardware accelerator in non-China region; developed further optimizations.
*Introduced vLLM based LLM serving to meet in-house business needs.
*Mentored new engineers.