Integrated many different sources of data into pipelines to build and evaluate sophisticated data simulations for city transportation networks. Built tooling to verify data integrity and coordinate large and complex computations at scale.
Projects
Data-Product ETL Pipeline:
Designed/implemented an ETL pipeline to move generated models from raw form in Google BigQuery to a fast production database backing the end-user platform. Used Prefect to manage the dataflow DAG which ran on Dask/Helm/K8S.
Artigraph:
Helped design and begin implementation of a data dependency and metadata tracking library. Rather than being computation-centric like most dataflow libraries (Airflow, Prefect), this is intended to be data-centric. It focuses on the semantic relationship between different artifacts, abstracting out (and providing implementations for) different storage backends, file formats, and in-memory representations.