# Jingwen Bai > SDE @Amazon Ads Location: New York, New York, United States Profile: https://flows.cv/jingwen I’m a Software Development Engineer at Amazon Ads, working on large-scale measurement and data platforms. I build attribution and data systems that work under changing privacy constraints, including 1P and Chrome 3P cookie deprecation. My work includes Spark-based pipelines for id-less attribution, deduplication and restatements supporting reliable reporting at scale. I also work on data platform projects such as migrating production data lakes from S3 Hive to Apache Iceberg, with a focus on reliability, performance, and cost efficiency. Much of my day-to-day work involves data correctness, safe backfills, and operating long-running pipelines. Earlier, I worked on stream processing at AWS Lambda and on production database migrations (MySQL to PostgreSQL). I enjoy working on practical data infrastructure problems and collaborating with engineers, data scientists, and product partners to make systems run reliably in production. ## Work Experience ### Software Engineer @ Amazon Jan 2023 – Present | New York, United States • Designed and delivered large-scale, privacy-aware ad attribution systems to mitigate 1P cookie deprecation, enabling GDPR-compliant measurement across EU markets and billions of daily ad impressions. • Designed and implemented a attribution gap mitigation for Chrome 3P cookie phase-out (Google PAA) in Amazon DSP measurement, reconstructing ~3M daily conversions and supporting reporting continuity. • Built distributed Spark pipelines on EMR with exact-once guarantees, implementing traffic splitting, deduplication, and id-less attribution logic to ensure correctness under evolving privacy constraints. • Drove data lake migration from legacy S3 Hive datasets to Apache Iceberg, reducing ~$2.3M/year in storage and compute costs while significantly improving query performance and data reliability for DS and ML teams. • Designed shared Spark–Iceberg read/write frameworks with schema validation, partition controls, and safe overwrite semantics, enabling scalable restatements, backfills, and long-term table maintenance. • Improved system scalability and operational excellence through dashboards, alarms, EMR auto-scaling automation, and on-call SOPs; collaborated closely with PM, Data Science, and ML engineers to deliver under tight timelines. ### SDE Intern @ Amazon Web Services (AWS) Jan 2022 – Jan 2022 | Seattle, Washington, United States • Developed a dynamic shard-level parallelization model for Lambda stream pollers to enhance event processing efficiency. • Introduced a metrics-driven concurrency controller to manage hotspot shards and uneven event distribution. • Contributed as a second inventor on two AWS patents related to dynamic shard-level parallelization and concurrency for event streams. ### Undergraduate Teaching Assistant @ The University of British Columbia Jan 2017 – Jan 2020 | Vancouver, Canada Area • Developed reliable code and implemented software improvements to enhance user-facing features. • Researched and organized teaching materials for five faculty members, supporting effective learning. • Coordinated with senior professors to facilitate postgraduate research dissertations, enhancing academic collaboration. ### Data Analyst Intern @ FORM Jan 2020 – Jan 2020 | Vancouver, British Columbia, Canada • Expanded functionality of KPI report to include new indicators and new sections, serving as a key decision-making tool that compares machine learning predictions with ground truth data to measure the accuracy of predictions • Developed data manage tools to synchronize timestamp field form different hardware, such as H10, OH1 • Configured and maintained Jenkins which provides automated and continuous delivery of the KPI report ### Software Engineer Intern @ Switchboard Jan 2018 – Jan 2018 | Vancouver, British Columbia, Canada • Led a significant data migration project, transitioning ~400GB from MySQL to PostgreSQL to enhance data consistency. • Developed efficient script-driven backfill jobs using SQL and Python, optimizing batch sizes for seamless migration. • Implemented rigorous data validation checks to ensure accuracy before and after the cutover. • Coordinated the final cutover with minimal service disruption, ensuring a smooth transition for production operations. ## Education ### Master of Science - MS in Data Processing and Data Processing Technology/Technician Columbia University ### Mathematics and Computer Science The University of British Columbia ## Contact & Social - LinkedIn: https://linkedin.com/in/jingwen-bai-366305149 --- Source: https://flows.cv/jingwen JSON Resume: https://flows.cv/jingwen/resume.json Last updated: 2026-03-29