Experienced Computer Architect and Technical Leader with background in AI acceleration and HW/SW codesign, focusing on end to end application performance and Perf/Watt I am passionate about computer system performance and making applications run faster!

2024 — NowGoogleSoftware Engineer - Tech Lead

2024 — Now

Leading teams working on performance optimizations for Gemini on GPUs, and analytical modeling of workload performance on current and future generation GPUs

2021 — 2024Cerebras SystemsMember of Technical Staff

2021 — 2024

San Francisco Bay Area

Performance and Software Optimization for DL training on the world's largest chip

Architected and built a SW stack for training CV models (CNNs and Diffusion) on the Cerebras WSE. Also tech lead for a 15 person team working on this effort

Optimzed performance and led SW bringup to enable LLM training on a next generation ASIC HW and system (CS3)

2017 — 2021NVIDIASenior GPU Architect

2017 — 2021

San Francisco Bay Area

I was a part of NVIDIA's core Deep Learning Architecture group working on HPC and ML kernel performance.

Before that, I was an Architect in the SM Architecture group working on improving energy efficiency. My contributions involved studying the mapping of Deep Learning applications on the SM and Tensor cores, and working on micro-architectural strong scaling features to reduce energy consumption. I also spent time prototyping JIT translators for generating performant GPU assembly.

May 2020 - June 2021

Delivered High Performance ML Kernels for CUDA libraries

Also contributed to CUTLASS OSS - https://github.com/NVIDIA/cutlass

Jun 2017 - May 2020

Worked on the architecture and design of the GPU SM and Tensor cores

Developed Binary translators for generating performant GPU assembly