April 2026-: Working on large-scale LLM/LDM training and inference optimizations of frontier models.
Jan 2024 - March 2026: Leading ML infra used by ~20 teams internally for aligning multimedia generation (image, video, audio) models at Meta's Superintelligence Labs (MSL), including the Synthetic Data Platform for large-scale synthetic data generation and curation; AutoEval Platform for automated evals; and human annotation tooling for safety redteaming and preference labeling.
Created (initially as sole IC and later building the team, continuing to be the top code contributor) and have end-to-end responsibilities over the following systems.
•Synthetic Data Platform: A system for large-scale synthetic data generation, curation for LLM and LDM model training and evaluation, auto-labeling with multimodal LLMs, and end-to-end adversarial testing. Used by 15+ teams to support launches for all Meta AI media products; Llama dataset curation use cases; and early model prototypes at the scale of hundreds of millions of examples
•RedOps: A data annotation platform where humans and models work together to help us build and iterate on the highest quality Gen AI models and products now used by labelers and redteamers across multiple orgs, primarily for safety-quality alignment
•AutoEval: A Python-based framework for building benchmark evaluators and executing model evaluation workflows used to evaluate diffusion models across over a dozen benchmarks