• Implement streaming and batch Apache Beam / Google Cloud Dataflow pipelines for data transformation in Python and Scala. Projects have included a streaming Scala pipeline to predict payment fraud, and a batch Python pipeline to convert unprocessed Premise App submission data to KMZ files downloadable by Premise's clients and Data Science team.
• Construct batch Airflow data pipelines in Python, ranging from a simple pipeline that ingests Facebook ad data and exports it to a Google BigQuery table, to complex pipelines such as one that queries Premise App submission data through SQL, runs it through a Random Forest fraud prediction model via a Docker container, and outputs the results to BigQuery. Built a Delta ETL in Airflow to pre-process data in BigQuery for use by data analysts, saving hundreds of dollars in cost per month.
• Build and support Python services for fraud detection, quality control, and data management. Significant contributions have included building auto-quality control modules (Flask REST API modules) to assess incoming Premise App submission data. For example, a module that utilizes a computer vision model developed by the Data Science team to predict the likelihood of incoming photos showing the expected content.
• Offer support and explain data engineering technology and concepts to the Data Science, Growth, Engineering, Product, and Management teams.