Machine Learning Accelerators org, worked on:
ML runtime and frameworks
• Implement features for Cruise ML inference framework, working closely with CUDA and ML engineers
• Integrate and maintain third party libraries into the inference framework, such as TensorRT, Tensorflow, Pytorch, CUDA, Eigen
• Deploy ML models for inference in simulation and on-road, working closely with ML engineers
• Core contributor to the redesign Cruise's ML runtime framework, to better support multiple generations of compute hardware and tightly integrate with the ML compiler
Custom ML accelerators
• Evaluated vendors for ML accelerator IP using metrics such as TOPS and inference latency
• Created tooling to simulate ML performance of custom silicon ahead of getting first boards
• Worked closely with hardware and embedded engineers to provide access of custom AI chips to the wider org, allowing early work on the system via hardware-in-the-loop (HIL) setups
Performance optimization and tooling
• Write tooling to better measure performance of the entire on-vehicle software stack, both ML and non-ML components
• Port algorithms (CUDA and CPU) to new compute generations and for better performance
• Bringup new generation of compute system by writing new image processing drivers, assessing overall system performance, and optimizing code to run better on the new system
• Lead "Latency budgets" initiative to measure the latency of various subsystems and keep them within bounds in both simulation and on-road