2025: Gemma 3n launch; led model co-design efforts, focusing on efficient on-device model performance.
2023-2024: Spearheaded hardware-software co-design for TPU, optimizing inference and training efficiency; deployed efficiency techniques for a wide set of Gemini models.
2022-2023:First generative AI for mobile (led model compression to achieve optimal memory/latency-quality for the model).
2022: Developed efficient inference for transformers, quantized PALM, and co-led the effort to launch it into the first product.
2019-2024: Conducted research in quantization/sparsity for transformers, launching inference-efficient techniques for Translate models. Developed a compression library to enable research at production at scale scalable. Contributed to hardware/software co-design, implementing multiple efficiency features across several TPU generations.
2017-2019: Core developer for Tensorflow and efficient TPU runtime infrastructure.