# David T. H. > Software Engineer Location: New York, New York, United States Profile: https://flows.cv/davidth ## Work Experience ### Software Engineer @ Zipline AI Jan 2024 – Present - Building and deploying the Zipline AI control plane on top of Chronon OSS (https://chronon.ai/), integrating it with multi cloud (GCP and AWS) to manage lifecycle execution of Spark and Flink jobs on Dataproc / EMR and perform compute sharing. - Expanded data interoperability of Spark jobs to support heterogeneous sources, including Apache Iceberg, BigQuery native/external tables, and Apache Hudi format. - Optimized application stability and resource footprint by diagnosing and resolving critical Docker JVM memory leaks using Java Flight Recorder for heap and thread dump analysis. - Reduced feature fetching latency by 30% by migrating payload serialization from JSON to Avro binary, significantly decreasing vector payload sizes and network overhead. - Stabilized the deployment pipeline by implementing safe, idempotent database migrations using Liquibase and containerizing the full local development environment via Docker. - Added robust end-to-end integration testing suite covering the CLI, compute engine, and inference service on different cloud providers. - Tuned Spark and Flink configurations to ensure job stability. For Spark, analyzed jobs to identify bottlenecks like skew and adjusted configurations appropriately; for Flink, added new operators to expand stream processing logic such as writing out to new outputs. Scala + Spark + Flink + Iceberg + Docker + Liquibase + Hudi + Dataproc + EMR + BigTable + Postgres ### Software Engineer @ Stripe Jan 2021 – Jan 2024 - Machine learning infrastructure engineer on Stripe's feature engineering platform Shepherd. https://stripe.com/blog/shepherd-how-stripe-adapted-chronon-to-scale-ml-feature-development Spark + Flink + Airflow + Scala + Java + Hive + Iceberg Led migration of Stripe's early merchant fraud detection offline ML model from legacy feature engineering platform to Shepherd. Implemented the Shepherd based features and conducted extensive offline evaluation to ensure new features matched old ones. Backtested features to confirm score distributions and recall of new features were inline with the pre-existing features. Led technical implementation of first asynchronous Shepherd based ML model for merchant fraud. Built out online flow consisting of an event consumer subscribed to various Kafka topics that fetched online features from feature store and conditionally trigger model scoring downstream. Worked with team of ML engineers to assemble an automated backtesting and training data pipeline using offline computed point-in-time training data involving Airflow + Flyte + Iceberg. - Led technical design and development of first cut of Stripe's new core product infrastructure managing multi-entity accounts/businesses. See https://docs.stripe.com/get-started/account/orgs. Responsible for ideation + development of the read optimized tree graph storage strategy to represent a Stripe enterprise business with multiple accounts and/or entities. Built read + write APIs and workflows including various locking schemes to support concurrent updates to the tree while maintaining correctness. Java + RPC + Protobuf + Bazel + MySQL + Mongo - Developed real time stream processing jobs using Flink + Scala + Bazel + Kafka to monitor the merchant experience of Stripe's customer base across the world. ### Analytics and Data Engineer @ AgileMD Jan 2021 – Jan 2021 Led and designed architecture plans to upgrade existing data pipeline infrastructure away from MongoDB including: - pros and cons proposal of replacing MongoDB with AWS Redshift or Snowflake - proof of concept detailing use of Airbyte to load from multiple sources of AgileMD data to AWS Redshift and DBT to own transformations within Redshift. - cost benefit analysis between building and hosting open source data frameworks listed above versus buying a managed ETL/ELT service ### Software Engineer @ Amazon Jan 2019 – Jan 2021 Building asynchronous micro-services with Java using AWS CloudFormation, Lambda, DynamoDB, SQS, and EventBridge to solve problems in cross border movement for Amazon Logistics. Architected and developed end to end solution to migrate production data in AWS DynamoDB to internal Amazon data lake for further processing by dependent business intelligence and data engineer teams using AWS CloudwatchEvents, S3, and SNS. Led performance readiness assessment for team’s services (> 10) in preparation for increased traffic due to 2020 Q4 holiday season. Analyzed previous traffic patterns and initiated load testing of individual services to understand if services’ limits would handle the forecasted traffic. Horizontally or vertically scaled team’s services on a case by case basis (larger AWS EC2 instances, more EC2 instances, configuring AWS DynamoDB to autoscaling, etc). Completed integration of Amazon Devices Logistics onto team’s platform and served as lead and primary point of contact for new technical issues and features. ### Software Engineer II @ Petal Jan 2019 – Jan 2019 ### Software Engineer @ Petal Jan 2018 – Jan 2019 | Greater New York City Area Transforming traditional credit underwriting to better serve credit invisibles. - Led development in internal web application to expose sensitive customer information (PII) to stakeholders using jquery, Python + Flask, Postgres + Redshift, and secret management with Hashicorp Vault. Deployed application using Docker + Hashicorp Nomad, and AWS EC2. Web application was tightly integrated with the day to day operations for risk, finance, and customer experience teams. - Architected solution to automate and centralize all data to AWS Redshift to build business dashboards showcasing insights from application/platform data combined with third party reporting. Used change data capture in AWS Database Migration Service to stream data from RDS Postgres to AWS S3, and batch processed to AWS Redshift. Managed infrastructure using Terraform. - Coordinated data engineering efforts with stakeholders such as risk, finance, and customer support analysts to model data in Redshift and better increase operational efficiency. Created dashboards for tech and non-tech teams to monitor business and engineering metrics with SQL and business intelligence tool (Periscope). - Drove initiatives to increase observability in production backend systems. Created monitoring and alerting systems around backend Flask app through Prometheus + Grafana, PagerDuty, and Slack in order to provide coverage around the health status and related APIs of the registration and dashboard website. - Re-architected backend mono-repository testing by segregating each component of repo to have its own seeded Postgres Docker database in order to parallelize unit testing. Decreased average testing time from 30 to 10 minutes. - Developed ETL pipelines in Python to batch process files in variety of formats stored in AWS S3 and ingested to AWS Redshift. Used open source Concourse CI and later Jenkins to schedule cron jobs. Reduced latency of ETL jobs across company by three times through multiprocessing via Python. ### Software Engineering Intern @ Capital One Jan 2017 – Jan 2017 | Richmond, Virginia Area - Migrated existing audit flow containing logs for customers of the commercial bank website Intellix from IBM’s DB2 mainframe to Postgres database using AWS CloudFormation. - Interacted with database administrators to implement new security groups, various login privileges, unique indexes, and schema for Postgres database using pgAdmin4. - Developed REST API for Postgres database using Java Spring, Spring Boot, and Jersey to allow faster access of customer logs in order to better assist customer support with troubleshooting and developers with software bugs for Capital One’s commercial bank website, Intellix. Implemented caching feature and configured time-to-live to reduce latency for database calls. Developed feature that exported and encrypted large collections of data to text file, and uploaded to Amazon S3 bucket. ### iOS Engineer Intern @ SQL Sentry Jan 2016 – Jan 2016 | Huntersville, NC - Developed UI that displayed all user-flagged data from a SQLite backend by creating a table view with collapsible sections and dynamic changing of table cell heights. - Handled user interaction for de-flagging of data and resulting removal of table cells. ### Electrical Engineering Intern @ Duke University Jan 2015 – Jan 2016 | Durham, NC Research under the Nanomaterials and Thin Films Lab at Duke University. Programming an algorithm in Mathematica that obtains structural parameters of carbon nanotube using various statistical distributions. ### Research Assistant @ Elon University Jan 2015 – Jan 2015 | Elon, NC Research under Dr. Scott Spurlock. Applying features extraction and dimensional reduction (machine learning) on video frames to analyze human movement using Matlab. ### Research Assistant @ Elon University Jan 2013 – Jan 2015 | Elon, NC Conducted research under Dr. Benjamin Evans. Investigated a model to explore the feasibility of a variation on traditional field geometries for magnetic nanoparticle hyperthermia cancer treatment. Used video microscopy to characterize magnetic forces on a novel magnetic microsphere. https://cismm.web.unc.edu/wp-content/uploads/sites/9983/2016/02/Evans_Han_High-Permeability_Elsevier.pdf ## Education ### Bachelor of Science (B.S.) in Computer Engineering Columbia University ### Bachelor of Science (B.S.) in Engineering Physics Elon University ## Contact & Social - LinkedIn: https://linkedin.com/in/david-t-han --- Source: https://flows.cv/davidth JSON Resume: https://flows.cv/davidth/resume.json Last updated: 2026-04-05