2025 — Now

Working on llm inference efficiency: weights loading, batching, inference profiling/optimization etc

1, https://www.mongodb.com/company/blog/engineering/token-count-based-batching-faster-cheaper-embedding-inference-for-queries

2, Led and coordinated eng efforts serving voyage 4 series models: https://huggingface.co/voyageai

3, Profiled and tuned MFU of Our MoE Model

4, Onboarded voyage-4-nano to vllm

5, [WIP] one gpu switch models in 1 second

2023 — 2025

AI and ML infrastructure

Big data infrastructure

2021 — 2023

Data Infra and platform

ML infra

2019 — 2021

Worked on and oncall all the cool services and infra below:

1, Kafka/ZK cluster and kafka tools/libs

2, Schema registry and serde lib in go/python

3, Seldon-core model serving platform

4, Experiments service

5, Feast feature store

2019 — 2019

University of Waterloo

Southeast University