I am a Founding Engineer at FriendliAI, optimizing Gen AI inference.

2021 — Now

San Mateo, CA

# 1. Inference Optimization (Speculative Decoding)

Led end-to-end research and development of speculative decoding systems for production LLM inference

Architected and implemented online draft model training, enabling continuous training during real-time serving

Designed a hybrid speculator (model-based + model-free) with dynamic routing based on scoring mechanisms

Built and deployed multiple model-based and model-free speculators

Trained 50+ draft models across Friendli Serverless and customer endpoints, optimized for diverse workload patterns

# 2. Inference Optimization (Kernel-Level)

Developed high-performance kernels for core LLM operations and sampling

Implemented specialized kernels for speculative decoding, improving end-to-end inference efficiency

# 3. Inference Runtime Development

Contributed to core inference runtime systems, including memory management, scheduling, and API server

# 4. Distributed Training (PeriFlow)

Led initial product development of PeriFlow, a distributed training platform for multi-cloud GPU environments

Architected fault-tolerant and resource management systems for reliable large-scale training

Led training and release of FAI-13B, ahead of Meta’s Llama 2

# 5. Solutions Architecture Leadership (US)

Led US Solutions Architect team, supporting 100+ customer PoCs

Managed strategic partnerships with cloud providers including AWS

# 6. Open Source & Ecosystem

Contributed to major open-source ecosystems including LangChain and LlamaIndex

2016 — 2018

Osan, Gyeonggi, South Korea

Seoul National University

Seoul National University