Full Stack AI Engineer | Continual AI
2024 — Now
San Francisco Bay Area
2021 — 2024
2020 — 2021
Santa Clara County, California, United States
• Developer on Triton Inference Server, optimizing GPU-based inference orchestration.
• Led creation of Triton Model Analyzer, streamlining workload agnostic configuration optimization for Triton Inference Server on NVIDIA GPUs.
• Enhanced performance tuning via parameter search algorithms and detailed reporting of latency vs throughput curves.
2018 — 2020
Urbana-Champaign, Illinois Area
2019 — 2019
Santa Clara, California
• Implemented performance improvements and cloud model-store integration in the TensorRT inference server.
University of Illinois Urbana-Champaign