Design scalable AI models and Kubernetes-based GPU service frameworks, optimize hardware performance and legacy services, and implement a microservices architecture with high availability and efficiency from 2021 to 2023 in TikTok AI-Lab
•Project: AI-Manga : [BEST effect throughout TikTok's history as of today!!! - Highlighted by Zhu Wenjia, Global TikTok R&D Chief, during the CEO Business Talk at the 11th ByteDance Anniversary in March 2023].
•Led the initial engineering design and service framework support for the model, ensuring stability and handling high traffic (peak QPS 20,000+) on the Big Day. Decided immediate downgrade solutions for online service issues and then coordinated cross-department troubleshooting
•Customized optimizations for fixed hardware using TensorRT and improved the generation algorithm performance by 1.6x on selected GPU.
•Tiktok Online Service: https://www.tiktok.com/sticker/AI-Manga-6048208
•Technial Share: https://cloud.tencent.com/developer/article/2228816
•Platform Infrastructure: Designed and implemented Kubernetes GPU service framework, Ivory.
•Provides resource management (e.g. GPU scheduling, monitoring, resource isolation) by injecting underlying CUDA C++ library.
•Integrates upper logic including RPC, HTTP, RocketMQ and Kafka interface exposure, compliance audit and internal K6 grafana to orchestrate the whole service lifecycle.
•Legacy Service Optimization: Developed a pipe based C++ - Python library to cross the GIL while maintaining thread safety to improve inference phase efficiency. Optimized a critical legacy service based on the library (20 to 45 QPS/T4), saved 150+ GPU instances and received a spot bonus.
•Microservices Architecture: Take the lead in the migration of a monolithic service into 3 parts of microservices and implemented CI/CD pipelines based on Internal Jenkins and GitLab. The system achieved 99.99% Availability SLA while handling 3,000+ QPS after deployment over 6 months.