SWE @ Meta
- LLM stuff - https://github.com/cthi
2025 — Now
GPU Kernel development with CUDA, CUTLASS, CK, Triton
FP8, FP4 quantization & GEMM kernel development
Performance optimization
Open source development
2024 — 2025
New York, New York, United States
Distributed & Disaggregated LLM Inference
vLLM Performance Optimization
Production debugging and fleet management
2024 — 2024
2022 — 2024
2022 — 2022
University of Waterloo
Bur Oak Secondary School