• Working closely with teams doing HW architecture, ML compiler, ML framework and ML researchers, publishing / open sourcing own research.
• Develop edge TPU optimized model architectures. Open source to Google's TensorFlow Model Garden.
• Building new backbones for visual perception using CNN and attention-based models. Building heads for semantic and instance segmentation and detection.
• Post-training model inference optimization
• Fitting large models to memory by optimizing sharding patterns, spills, and fills.
• General-purpose TPU computing, sparsity fit for TPU, quantization methods to optimize performance.
• Recommendations for ML hardware, compiler, and runtime design.
• Analytical performance and power modeling, automated memory layout,
• Standard model transformations in the compiler,
• Post-silicon performance validation, and software stack optimization.