1. AI Infra - PyTorch Inference Enablement for RecSys
Post training model optimizations to scale sparse / sequential arch of the generative recommendation models for GPU distributed inference at massive scale (10+TB).
2. Ads ML & AI Infra - Feature and Training Data Infra
TL within Realtime Feature Infra team where I worked on improving feature freshness, enabling new feature paradigms and modernizing company wide feature infra stack to deliver huge product wins across Ads, Feeds, IG and Integrity teams.
• Rearchitected our realtime/streaming feature infrastructure for Ads Ranking to improve feature freshness from 10+ min -> < 10s leveraging Kappa architecture and deliver $100 million+ revenue YoY.
• Enhanced the feature platform to support generation and serving of near realtime (10+ min freshness) graph learning (e.g. PPR, GNN etc) features widely used for user representation and ads ranking use cases (user features, ads related to a specific ad etc).
• Modernized the recommendation ML feature platform with rich set of feature paradigms (event based features, topK, latestN etc), achieving wide adoption across multiple FB/IG products and contributing to significant product metrics wins (e.g. Reels watch time>18%, Facebook global session ~2%, IG Session > 0.11%, Feed VPV > ~2%).
• Developed several key capabilities to modernize the feature infrastructure for Integrity teams at Meta, including support for new operators, feature sharing, and ensuring seamless integration with training platforms
3. Ads Realtime Data Infra - Audience Infra
• Dynamic re-sharding and Elias-Fano encoding of the data to deliver 30% storage optimization for 1+ PB data, 30%+ memory improvements, 20%+ CPU utilization and reducing the overall service start time from 1+ day to a few hours.
• Supported new search query patterns e.g. filtering and aggregating data based on fact tables that can be combined with search queries.