I’m a backend engineer who builds tools and processes that leverage the cloud and distributed systems to manage and analyze large amounts of data. For my next role, I’d like to work on problem of scaling, large data infrastructure, enabling engineers, and processes for ETL and reporting.
New York, New York, United States
Reduced multi-million dollar spend on our Snowflake data warehouse by 20%, by creating a strategy around routing usage to different warehouses based on timing needs, such as precompiling reports and priority-based access.
Compiled organizational usage data to justify warehouse spend across the company, coordinating with 6 key stakeholder teams. Key input for negotiating an annual contract with Snowflake.
Designed the permission system for Snowflake. Specifically led and implemented team-based authorization, creating the concept of a Service Account User and wrote a Python library to allow previously inaccessible programmatic usage through APIs and Jupyter notebooks.
Centralized usage of credentials in Vault, requiring all programmatic access to be pathed through this interface, avoiding the unsafe storage of unencrypted credentials or fragmented storage.
Standardized six team’s varying data object YAML schemas, then led the subsequent migration to the new standard from existing YAML, data pipelines, and ingestion code.
New York, New York, United States
Project owner and lead maintainer of multiple CI / CD pipelines for code gating, coverage analysis, and scheduled health checks on deployed environments for a team of 13 engineers.
Implemented Cypress infrastructure for reliable and performant testing of team’s codebase; increased code coverage from 20% to 54% singlehandedly before handing off to QA and fullstack developers to increase and maintain.
Produced a set of Cypress scripts which rapidly seeds new tenants with data through application’s UI to allow QA to quickly, repeatedly test specific scenarios, resulting in 400 hours of saved QA time.
Architectured 3 build pipelines which replicate completely new build of application before testing: provision cloud databases, coordinate multiple services against provisioned resources, run tests in parallel, and publish analysis.
Designed custom virtual machine image preconfigured with software and specs optimized for our testing, reducing pipeline run time from 35 minutes to 15 minutes.
Optimized agent pool scale settings to decrease queue time from 15 minutes to less than one minute, resulting in 30% faster time to pass tests on a pull request.
2019 — 2020
New York, New York, United States
Built Node and Express API robotic management service which exposed the application’s data resources to developers, researching best practices in REST design to adhere to OpenAPI 3.0 standards.
Wrote Postman test suite with over 1200 regression tests, with different configurations to run the code against various environments.
Spearheaded use of Swagger.doc in backend services which auto-documented application schema in great detail.
New Haven, Connecticut
Develop performant frontend data analysis and management tools using React, Redux and D3JS
Use Python and Postgres to write APIs and ETL processes that aggregate and measure client data
Handle team’s software packaging, build and deployment to UAT and production using Ansible and Vagrant
Work closely with data analysts to create data models in Postgres to describe client healthcare information
Maintain 90% test coverage using PBBT, Doctest and Unitest for all code on team
Some features I have implemented:
Administrator portal using React, Python and Postgres to help physicians analyze and manage patient data
Analysis dashboard that visualizes research data using D3JS
API that munges healthcare data and transports to UAT database to assist in testing
API and backend service which analyzes thousands of client data daily to produce health outcome scores
Test generator that consumes JSON data schemas and tests against implemented resource definitions
Greater New York City Area
Uploaded massive volume of transactions by creating data pipes to SQL databases.
Validated and munged financial documents using Python data manipulation libraries.
Used email library to create records of upload process and send stack traces for error handling.
Wide unit test coverage ensures future features are easily integrated.
Education
University of Chicago