•Developed and maintained high-performance systems for real-time bidding, ad serving, and auction platforms, to support millions of daily transactions with sub-50ms latency.
•Led optimization of the end-to-end ML serving stack (data processing, model evaluation, feature generation, model inference, fine-tuning, A/B testing infrastructure), reducing inference costs and improving model freshness.
•Designed and productionized a highly reliable daily ML model evaluation and auto-promotion pipeline using Apache Airflow; orchestrated model validation, performance benchmarking, shadow traffic testing, and safe rollout, reducing manual promotion time from days to under 4 hours while ensuring zero regression incidents.
•Developed a lightweight, high-performance Java SDK integrated into serving paths; achieved sub-millisecond overhead for feature logging and bucket assignment while streaming billions of events daily to Kafka → GCS/Data Lake for offline analysis.
•Optimized the full experimentation data lifecycle (event ingestion, joining, metric computation) in the data lake, cutting average experiment analysis latency from 24+ hours to under 3 hours.
•Architected low-latency microservices for retrieving feature data from Cassandra DB for online model serving, incorporating caching strategies to reduce latency by 60% and enhance throughput.