Experience
2022 — Now
2022 — Now
San Francisco Bay Area
Staff engineer working on the offline data storage team; running HDFS at EB scale. Some notable achievements:
• Designed & built 2 POCs for a single-tenant distributed cache to eliminate noisy-neighbor impact in a cross-org effort. ~20% AI training time speedup with lower variance in production. One built in-house over 6mo and another via GPFS
• Migrated 1+ EB across 12 clusters to a new data center with new hardware SKUs with zero downtime. I designed and executed this over 6 months of preparation and 8 months of execution with negligible impact. Team of 5 engineers + coordinating with 3 other infrastructure teams.
• Led an overhaul of HDFS deployment automation over 3 initiatives: designed locking tenants, data loss prevention, batching heuristics, deployment safety, and deployment stability in a 50k+ node fleet with 5+ EB of data
• Established operational excellence framework introducing feedback loops on human efficiency, deployment health monitoring, incident mitigation time, etc. across 3 teams. This has led to 100+ identified opportunities and per-component execution plans to improve all metrics to new targets.
2020 — 2022
2020 — 2022
Seattle, WA
Spent time working on the early stages and initial designs of supporting Prime off Amazon. Notably:
• Lead for designing the entire affiliate marketing attribution system for Prime off Amazon. This cross-org initiative was for a team of 8 to collaborate with a team of 30 across 12 months. I worked on defining the architecture and negotiating the high-level contracts & SLAs between teams to achieve 99.99% availability & completeness.
• Led operational and security efforts on a console platform. Dozens of teams were integrating and had to meet the Amazon standards themselves, so I led minimizing the redundant effort of each integrating team.
2016 — 2020
Seattle, Washington
My time at AWS was working on AWS CloudFormation: AWS's Infrastructure as Code (IaC) service. I've touched most components of the service in one form or another and have become the team's subject matter expert in a few of them. Here I gained quite a bit of exposure designing, building, and operating distributed systems at the massive scale that is AWS.
A handful of the projects I've been involved with:
• Led a cross-team initiative to overhaul the CloudFormation provisioning engine to build a platform that enables customers to influence how provisioning works with custom logic that can be vended to other customers. We worked closely with another team (AWS CodeDeploy) to deliver a Blue-Green deployment behavior on this platform: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html
• Automated deployment processes for 5 services to CI/CD pipelines with auto-rollback strategies, safely parallelizing deployments, deployment windows, proper testing, etc
• Started an initiative to prioritize operational health in the org across 5 teams via improving automation, redefining alarming strategy, redistributing operational ownership, and pushing for prioritizing fixes
• Designing and building a system responsible for crawling our data stores to identify & cleanup PII data for GDPR compliance with a 100% completeness guarantee
• Enhancing operational and region-bootstrapping automation to build out the service
• Some other yet-to-be-public projects
2014 — 2016
Rochester, Michigan
Student IT job that consisted of supporting the technology throughout campus housing and assisting students. Among other miscellaneous tasks, common work was reformatting computers, virus removal, and registering users on the network.
2015 — 2015
2015 — 2015
Seattle, Washington
I spent the summer on the AWS CloudFormation team. Most of my time was spent making various enhancements to the internal operational tooling used by the team and by support engineers
Education
Oakland University