# Yijing Bai

> Staff Software Engineer at Waymo

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/yijing

Accomplished Staff Software Engineer and Tech Lead at Waymo, specializing in the development of state-of-the-art generative AI for autonomous vehicle simulation. Adept at leading teams to build and scale deep learning systems, my work leverages cutting-edge technologies including video diffusion(Veo3.1) and World models(Genie3) to generate realistic, multi-modal sensor data to simulate rare, long-tail events. I have also pioneered the development of innovative agentic AI, creating Gemini-based agents to automate simulation realism triage and enable end-to-end scenario generation from natural language. As a key contributor from conception through deployment, I founded and helped scaled the 3D Gaussian Splatting (3DGS) sensor simulation at Waymo to over one million reconstructions and have co-authored multiple publications in top-tier conferences such as NeurIPS and CVPR.

## Work Experience
### Staff Software Engineer @ Waymo
Jan 2024 – Present | Mountain View, California, United States
Tech leading our team to develop state-of-the-art system to generate and simulate realistic environment(Scene, Camera, Lidar data) for Waymo driver eval and validation using latest techs like video diffusion(Veo3.1), World Models(Genie3), 3DGS, NeRF, LLM Agent.
- Large-Scale 3DGS Sensor Simulation: Founded and scaled the 3D Gaussian Splatting (3DGS) sensor simulation at Waymo from a small project to over one million reconstructions. This initiative reconstructs realistic 3D scenarios from sensor logs for safety evaluations and has grown from a team of 3 to a 20+ person effort, shaping Waymo's next-gen ML-based simulation.

- Sensor Diffusion: Contributed to a research project using video diffusion models to generate synthetic sensor data for testing autonomous vehicle behaviors, with work published in IROS 2025 (Oral). This evolved to use the world model, Genie3, for multi-modal (Camera, Lidar) simulation of rare events.

- SceneDiffuser and SceneDiffuser++: Founded and developed the SceneDiffuser model, the core of SceneAI, for large-scale, controlled scenario creation and traffic generation, with research published in NeurIPS 2024. SceneDiffuser++ expanded this for city-scale traffic generation, with work published in CVPR 2025.

- SceneAI: Directed the development of the generative AI engine for the team's data synthesis platform, delivering tools to generate traffic, densify rare events, and create scenarios from natural language.

- Agentic Simulation Realism Triage: Proposed and developed a Gemini-based agent to triage the realism of simulation results, enhancing its tool-use and reasoning capabilities.

- Agentic Scene Generation: Developed a Gemini-based agent using a diffusion model for end-to-end synthetic scenario generation from natural language and images. This project won a company-wide hackathon and was launched in the data synthesis platform.
- AutoDense: Developed a diffusion model to generate rare events from log, supporting multiple launches.

### Senior Software Engineer @ Waymo
Jan 2022 – Jan 2024
Lead our team’s effort on three generative AI(AIGC) products with four engineers, we work with top researchers to train deep learning diffusion model(similar to DALL·E 2, Stable Diffusion) and build system around it to scaly generate realistic synthetic data for running simulation of Waymo and evaluate Waymo driver behaviors.
• Responsible for model design & implementation & training, model integration with production simulation system, cross team collaboration, production usage landing and onboard engineers engagement.
• AITrafficGen: Landed our first generative AI product: AITrafficGen, which support generate realistic traffic in large scale(>100k scenarios) to discover long-tail events. Model produces 16 agents, with 12s trajectory of each, similar to diffusion based video generation. The synthetic set created with it already used in prod launches.
• AutoDense: Proposed and building our second generative AI product for automatic long tail events densification, support constrainted diffusion to generate similar rare events given seed long tail events to densify evaluation signal. This is similar to image guided diffusion. Backbone of it is based on Perceiver IO, and used transformer as encoder & denoiser. We build the model as large multi-task foundational model.
• LLM Guided SceneGen: Inspired by DALL·E 3, proposed third generative AI product use multi-modal LLM to guide our diffusion model generate synthetic scenario end-to-end given natural language description and image to boost the synthetic data generation at Waymo. Initial protoptype shows very promising result.

### Senior Software Engineer @ Google
Jan 2021 – Jan 2022
Work in Google Assistant Evaluation Eng Team, Focused on building our next-gen Assistant Eval Infra: Assistant Hermetic Eval which provides high eval fidelity, better data
privacy, better scalability. 
Tech-Led three engineers to build two critical subsystems: Output scrubbing and on-device hermetic eval
 Scoped the open-ended problem, designed the whole project, lead three engineers implemented the project.
– Drove the collaboration with Google Assistant Infra team, NLG(Natural Language Generation) Team, Feature teams to
implement the project.
– Tackled the key problem by using the semantic based annotation in the NLG request.
– Presented the project in org’s all-hands.

### Software Engineer III @ Google
Jan 2019 – Jan 2021 | Mountain View
In Google Assistant Evaluation Eng Team, I Proposed & designed & implemented Assistant Eval Fidelity Toolchain, which is a series of tools to help improve Assistant Eval
Fidelity including:
– Fidelity Dashboard: Dashboard show eval fidelity for each surface feature. Built using Google SQL, PLX.
– Fidelity Search: Automatically search for the data that can increase the eval fidelity by construct field tree, eliminate nodes,
issue data to Assistant stack.
– Fidelity Bug Manager: File bugs to developers for fidelity findings from fidelity search automatically.
– Planned & conducted Fidelity Fixit event with 7 feature teams 1 surface team, fixed 38 fidelity issues.
– Authored 31,166 lines of codes for the toolchain and got 3 spot bonuses, 4 peer bonuses for it.

Designed & Implemented deep learning based Side-by-side eval noise classifier
– Designed & implemented noise data collection pipeline in our eval tool to collect noise, join eval data and extract noise
features.
– Designed & implemented the noise classifier training pipeline using TensorFlow, TFX(Tensorflow extended) to train the DNN
model and the random forest model for noise classification.
– Designed & implemented the noise confidence metric in in eval system for users to filter out the noise automatically from their
eval report, and also provides noise feedback for training.
– Presented the classifier in Org’s all-hands
– Authored 18,539 lines of codes for the project

### Software Engineer II @ Google
Jan 2017 – Jan 2019
Work in  Google Assistant Infra Team, I migrated the Google Assistant Conversation State API, Built tool to audit Assistant backend failure and improving stability by injecting backend errors.

### Software Engineering Intern @ Google
Jan 2016 – Jan 2016 | Mountain View
Project:High Performance In-memory Aggregation Server Core component of Next-Gen Muti-DataCenter Aggregation Datastore that will store all statistics for Ads traffic

• Implemented in-memory aggregation library, and got 3X performance improvement
• Designed and implemented RPC service with gRPC, Spanner, gunit, gmock using Google's best practice
• Built and deployed benchmark suite for aggregation library to steer design decisions

### Software Engineer @ Baidu, Inc.
Jan 2013 – Jan 2015
Project I: Real-time advertisements report statistics system
• Daily dealt with billions of search&click&impression logs
• Cooperated with 36 people from 13 teams, completed log analysis workers
• Shortened the delay period of full flow online report of the system from 3.5 hours to 5-10 minutes 

Project II: Hadoop Historical Data Management System
• Remove useless historical data based on user config and saved 10% space in HDFS cluster
• Implemented RESTful APIs in Django and Django RESTful framework, and released it to the other team of Baidu

### Exchange Student @ UC Berkeley
Jan 2012 – Jan 2013
Project: Taint-Tracking JavaScript interpreter:

- Implemented a AST interpreter wrote in CoffeeScript, used dynamic tracking to protect sensitive data like user cookies;
- Used esprima library to parse javascript code, dynamic trace function execution, and use jsPlumb to draw function call graph;

### Student Technology Director @ Student Innovation and Practice Center of TJUT
Jan 2012 – Jan 2012 | Tianjin City, China
Responsible for the technology development of the student in Student Innovate and Practice Center(SIPC), and lead the student develop the website as well as the program for our student activity and for the several department of Tianjin University of Technology.


## Education
### Artificial Intelligence
Stanford University
Jan 2020 – Jan 2022

### Master's degree in Computer Science
University of Wisconsin-Madison
Jan 2015 – Jan 2016

### Exchange in EECS
University of California, Berkeley
Jan 2012 – Jan 2013

### Bachelor's degree in Information security
Tianjin University of Technology
Jan 2010 – Jan 2014


## Contact & Social
- LinkedIn: https://linkedin.com/in/yijingbai

---
Source: https://flows.cv/yijing
JSON Resume: https://flows.cv/yijing/resume.json
Last updated: 2026-03-23