# Jingwen Bai

> SDE @Amazon Ads

Location: New York, New York, United States
Profile: https://flows.cv/jingwen

I’m a Software Development Engineer at Amazon Ads, working on large-scale measurement and data platforms.

I build attribution and data systems that work under changing privacy constraints, including 1P and Chrome 3P cookie deprecation. My work includes Spark-based pipelines for id-less attribution, deduplication and restatements supporting reliable reporting at scale.

I also work on data platform projects such as migrating production data lakes from S3 Hive to Apache Iceberg, with a focus on reliability, performance, and cost efficiency. Much of my day-to-day work involves data correctness, safe backfills, and operating long-running pipelines.

Earlier, I worked on stream processing at AWS Lambda and on production database migrations (MySQL to PostgreSQL).

I enjoy working on practical data infrastructure problems and collaborating with engineers, data scientists, and product partners to make systems run reliably in production.

## Work Experience
### Software Engineer @ Amazon
Jan 2023 – Present | New York, United States
• Designed and delivered large-scale, privacy-aware ad attribution systems to mitigate 1P cookie deprecation, enabling GDPR-compliant measurement across EU markets and billions of daily ad impressions.
• Designed and implemented a attribution gap mitigation for Chrome 3P cookie phase-out (Google PAA) in Amazon DSP measurement, reconstructing ~3M daily conversions and supporting reporting continuity.
• Built distributed Spark pipelines on EMR with exact-once guarantees, implementing traffic splitting, deduplication, and id-less attribution logic to ensure correctness under evolving privacy constraints.
• Drove data lake migration from legacy S3 Hive datasets to Apache Iceberg, reducing ~$2.3M/year in storage and compute costs while significantly improving query performance and data reliability for DS and ML teams.
• Designed shared Spark–Iceberg read/write frameworks with schema validation, partition controls, and safe overwrite semantics, enabling scalable restatements, backfills, and long-term table maintenance.
• Improved system scalability and operational excellence through dashboards, alarms, EMR auto-scaling automation, and on-call SOPs; collaborated closely with PM, Data Science, and ML engineers to deliver under tight timelines.

### SDE Intern @ Amazon Web Services (AWS)
Jan 2022 – Jan 2022 | Seattle, Washington, United States
• Developed a dynamic shard-level parallelization model for Lambda stream pollers to enhance event processing efficiency.  
• Introduced a metrics-driven concurrency controller to manage hotspot shards and uneven event distribution.  
• Contributed as a second inventor on two AWS patents related to dynamic shard-level parallelization and concurrency for event streams.

### Undergraduate Teaching Assistant @ The University of British Columbia
Jan 2017 – Jan 2020 | Vancouver, Canada Area
• Developed reliable code and implemented software improvements to enhance user-facing features.  
• Researched and organized teaching materials for five faculty members, supporting effective learning.  
• Coordinated with senior professors to facilitate postgraduate research dissertations, enhancing academic collaboration.

### Data Analyst Intern @ FORM
Jan 2020 – Jan 2020 | Vancouver, British Columbia, Canada
•	Expanded functionality of KPI report to include new indicators and new sections, serving as a key decision-making tool that compares machine learning predictions with ground truth data to measure the accuracy of predictions
•	Developed data manage tools to synchronize timestamp field form different hardware, such as H10, OH1
•	Configured and maintained Jenkins which provides automated and continuous delivery of the KPI report

### Software Engineer Intern @ Switchboard
Jan 2018 – Jan 2018 | Vancouver, British Columbia, Canada
• Led a significant data migration project, transitioning ~400GB from MySQL to PostgreSQL to enhance data consistency.  
• Developed efficient script-driven backfill jobs using SQL and Python, optimizing batch sizes for seamless migration.  
• Implemented rigorous data validation checks to ensure accuracy before and after the cutover.  
• Coordinated the final cutover with minimal service disruption, ensuring a smooth transition for production operations.


## Education
### Master of Science - MS in Data Processing and Data Processing Technology/Technician
Columbia University

### Mathematics and Computer Science
The University of British Columbia


## Contact & Social
- LinkedIn: https://linkedin.com/in/jingwen-bai-366305149

---
Source: https://flows.cv/jingwen
JSON Resume: https://flows.cv/jingwen/resume.json
Last updated: 2026-03-29