Media Logging
* Built and maintained robust, scalable media logging + metrics infrastructure for Instagram (5M+ QPS)
* Led workstreams' (3-5 teams, 8-12 engineers) strategy and execution to hit org objectives
* Improving Service Reliability (Monitoring, Traceability, and Recovery) -> increased from 92.4% -> 97%
* Improved oncall QoL, efficiency, and time-to-mitgate by 75% via re-orgs and CI process
* Increased scalability of Video Logging infrastructure by 36% by optimizing bottleneck data operations
* Created and operationalized SLOs used as success measure for the team and its services
* Increased efficiency of systems and infrastructure resulting in cost savings of $4M / year
* Improved e2e testing speed by 10x and enabled pre-prod signal for logging<>metrics by building framework integrations for testing infra
* Reduced prevalence of fraudulent data by 23% via improved monitoring + enforcement at scale
* Led response for 50+ incidents (investigation, mitigation, prevention, and retros)
Video Logging and Delivery
* Improved video delivery data efficiency by 13% for bandwidth-bound networks
* Increased efficiency of infrastructure, resulting in savings of $1.377 M / year
* Improved regression response and mitigation time by 20% through improved monitoring and tooling
* Designed, implemented, and patented data processing paradigm saving upwards of 30% of resource cost
* Oversaw zero-downtime architecture migration for critical metric pipelines