•Designed and implemented Perception Off-Board Data Preparation Pipeline over 20,000 Perception data with Python and Spark, providing Modeling Team with highly reliable, quality-checked data
•Developed Geo-Split Data Splitting Algorithm to split data according to Geometry location, eliminating overfitting and increasing perception accuracy by 1% ~ 3% for different models
•Deployed pipeline on AWS EC2 and S3 using Docker and Spinnaker
•Analyzed and Visualized data using Pandas and Bokeh, generating over 40 metrics across multiple dimensions on the quality of data collected, improving data label density by 27%
•Integrate with VerCD continuous delivery system to feed data and report to modeling team