Bringing reliability practices closer to engineers by building out frameworks for end-to-end testing and integrating these tests with development workflows. Working to zero out the number of incidents caused by under-tested code and workflows reaching production :scream:
Built out load-testing, chaos-injection, and monitoring tooling to improve the reliability of Block's existing systems and ensure new systems could seamlessly replace our largest monoliths as we decomposed them into more-maintainable services. Helped ensure that large system shifts from on-prem to AWS went smoothly and protected business continuity. (I.e. I gently broke critical business infrastructure in production and hoped customers didn't notice.)
Semi-related: Kotlin became my favorite language, also k9s is a fabulous Kubernetes CLI #notsponsored
Led a distributed team focused on architecting the Node.js frameworks and gRPC interfaces that 70 engineers built on to support integrations with 12,000+ data partners.
We partnered with dev teams across the org to modify service contracts and inter-service communication channels for our most business-critical workflows with (hopefully) zero downtime (but yes, I probably was to blame for those times when your fave fintech app stopped working... š¬).
Led a team which was primarily responsible for building and scaling tooling and services to support continued development of Plaid's integrations with financial institutions in ways that minimized touchpoints with sensitive information and promoted engineer velocity.
Had a summer from hell where I sent myself 13,000 notifications as my team and I attempted to dial in an alerting story that would allow a single oncall engineer to monitor all of Plaid's integrations 24/7 while maintaining their sanity. We ended up with something beautiful that typically let us know that banks were down before they even realized there was a problem (I won't name names but you know who you are JPMC). Definitely had to change my ringtone afterwards ā iykyk.
We used tools like Prometheus, Grafana, and Kibana in our day-to-day work as well as Typescript, Go, and Python running in Docker on ECS and K8s. Most of us also spent an inordinate amount of time writing very small functions in Bash for our service startup scripts (I still Google every time I have to write a conditional; don't judge me) or fighting YAML for our legacy CloudFormation templates.