•Designed and delivered large-scale, privacy-aware ad attribution systems to mitigate 1P cookie deprecation, enabling GDPR-compliant measurement across EU markets and billions of daily ad impressions.
•Designed and implemented a attribution gap mitigation for Chrome 3P cookie phase-out (Google PAA) in Amazon DSP measurement, reconstructing ~3M daily conversions and supporting reporting continuity.
•Built distributed Spark pipelines on EMR with exact-once guarantees, implementing traffic splitting, deduplication, and id-less attribution logic to ensure correctness under evolving privacy constraints.
•Drove data lake migration from legacy S3 Hive datasets to Apache Iceberg, reducing ~$2.3M/year in storage and compute costs while significantly improving query performance and data reliability for DS and ML teams.
•Designed shared Spark–Iceberg read/write frameworks with schema validation, partition controls, and safe overwrite semantics, enabling scalable restatements, backfills, and long-term table maintenance.
•Improved system scalability and operational excellence through dashboards, alarms, EMR auto-scaling automation, and on-call SOPs; collaborated closely with PM, Data Science, and ML engineers to deliver under tight timelines.