2021 - 2023
Pytorch R2P, TorchX, TorchElastic, torch::deploy
Working on open source project optimized for fast distributed training iteration and providing a standardized portable production grade machine learning solution for pytorch.
https://pytorch.org/torchx/latest/
https://github.com/pytorch/torchx
2018 - 2021
Distributed AI, AI Infra
I work on making large scale machine learning more reliable, efficient and scalable. Distributed model and data parallel training. The largest Instagram, Facebook, and Ads models are trained on the software I work on.
Contributor to both pytorch and caffe2 frameworks.