model inference and fine tuning
2025 — Now
Working on llm inference efficiency: weights loading, batching, inference profiling/optimization etc
1, https://www.mongodb.com/company/blog/engineering/token-count-based-batching-faster-cheaper-embedding-inference-for-queries
2, Led and coordinated eng efforts serving voyage 4 series models: https://huggingface.co/voyageai
3, Profiled and tuned MFU of Our MoE Model
4, Onboarded voyage-4-nano to vllm
5, [WIP] one gpu switch models in 1 second
2023 — 2025
AI and ML infrastructure
Big data infrastructure
2021 — 2023
Data Infra and platform
ML infra
2019 — 2021
Worked on and oncall all the cool services and infra below:
1, Kafka/ZK cluster and kafka tools/libs
2, Schema registry and serde lib in go/python
3, Seldon-core model serving platform
4, Experiments service
5, Feast feature store
2019 — 2019
University of Waterloo
Southeast University