Experienced Software Development Engineer implemented 20,000+ QPS Generative AI projects at TikTok. Adept at optimizing GPU service performance and designing ML frameworks and served 150+ services. Skilled in microservices architecture, Kubernetes, and GPU Inference acceleration. Mostly using Python and C++.

Experience

2021 — NowTikTokSoftware Development Engineer II

2021 — Now

Design scalable AI models and Kubernetes-based GPU service frameworks, optimize hardware performance and legacy services, and implement a microservices architecture with high availability and efficiency from 2021 to 2023 in TikTok AI-Lab

Project: AI-Manga : [BEST effect throughout TikTok's history as of today!!! - Highlighted by Zhu Wenjia, Global TikTok R&D Chief, during the CEO Business Talk at the 11th ByteDance Anniversary in March 2023].

Led the initial engineering design and service framework support for the model, ensuring stability and handling high traffic (peak QPS 20,000+) on the Big Day. Decided immediate downgrade solutions for online service issues and then coordinated cross-department troubleshooting

Customized optimizations for fixed hardware using TensorRT and improved the generation algorithm performance by 1.6x on selected GPU.

Tiktok Online Service: https://www.tiktok.com/sticker/AI-Manga-6048208

Technial Share: https://cloud.tencent.com/developer/article/2228816

Platform Infrastructure: Designed and implemented Kubernetes GPU service framework, Ivory.

Provides resource management (e.g. GPU scheduling, monitoring, resource isolation) by injecting underlying CUDA C++ library.

Integrates upper logic including RPC, HTTP, RocketMQ and Kafka interface exposure, compliance audit and internal K6 grafana to orchestrate the whole service lifecycle.

Legacy Service Optimization: Developed a pipe based C++ - Python library to cross the GIL while maintaining thread safety to improve inference phase efficiency. Optimized a critical legacy service based on the library (20 to 45 QPS/T4), saved 150+ GPU instances and received a spot bonus.

Microservices Architecture: Take the lead in the migration of a monolithic service into 3 parts of microservices and implemented CI/CD pipelines based on Internal Jenkins and GitLab. The system achieved 99.99% Availability SLA while handling 3,000+ QPS after deployment over 6 months.

2024 — NowByteDanceMachine Learning System Intern

2024 — Now

Seattle, Washington, United States

Participate in the development of a comprehensive Model Resource Traking Platform - Lineage to manage AI asset collected from HDFS, Data Lake with Apache Iceberg, and Model Zoo.

2020 — 2020Alibaba GroupSoftware Engineer Internship

2020 — 2020

Developed a graph-based frequent itemset analysis algorithm for CI/CD pipeline data, stored in GraphDB, processed with GraphQL, achieving 75.5% accuracy over two months.

Utilize Helm for managing Kubernetes applications through Helm charts for templating and versioning.

Participated in Prometheus monitoring module update for the Alibaba Hybrid Cloud Parent Image.

Education

Carnegie Mellon University

Master's degree

Huazhong University of Science and Technology

Experience

Education

Master's degree

Bachelor's degree