Held critical roles in the enabling self-driving software stack to run on multiple generations of computing platforms; active participation in end-to-end lifecycle, including initial design, bringup, algorithm transition and optimization, and maintenance.
Work across teams to improve platform performance by (1) analyzing modules for opportunities for performance improvements, (2) leverage accelerator platform by integrating libraries and developing custom GPU code, (3) developing and monitoring platform performance metrics.
Develop, deploy and maintain tooling to convert deep learning models/networks to inference-optimized modules; currently targeting production inferencing framework (TensorRT) and investigating open-source compilers.
Led team focused on runtime performance optimizations and performance observation tooling
Led team focused on identifying algorithmic needs for neural networks and sensor ingestions flows for future computing platforms.
Architected and developed highly efficient, low-latency, in-loop image compression for computer vision pipeline; worked with and coordinated multiple teams to test, evaluate, and deploy the platform.