Experience
2021 — 2025
2021 — 2025
San Francisco Bay Area
• Designed, planned, and executed migration of the largest, semi-stateful NodeJS service from EC2 VMs to Kubernetes, enabling safe autoscaling and canary analysis
• Designed, planned, and drove implementation of multi-cluster workload controller to manage safer rollout and migration of 10,000s of pods
• Rapidly iterated with customer team on new Kubernetes cluster design, enabling data analysis workloads and AI-powered product features
• Deployed and validated vendor's solution to production in 3 weeks while porting provided configs from CloudFormation to Terraform, ensuring HIPAA compliance for customer support
• Migrated all workloads across major Ubuntu versions without incidents or downtime
• Traced nondeterministic bug with pod security groups to AWS infrastructure, crafted repro for AWS support, and persistently followed up until bugfix went live
• Onboarded and mentored 2 new hires to the team
2017 — 2021
2017 — 2021
San Francisco Bay Area
• Proactively shipped 20+ improvements in core sharing code to reduce latency of new APIs by up to 65% and existing public APIs by up to 30%
• Led cross-team collaboration to design, implement link permissions in next-gen Dropbox filesystem while preserving compatibility with legacy systems
• Implemented next-gen data block vacuuming system in Go and PySpark with 3 collaborators. Proactively cut run latency by 50% after system shipped to production
• Mentored 2 summer interns, once as primary mentor, and onboarded 1 new hire to team
2016 — 2016
2016 — 2016
San Francisco Bay Area
• Added widely relevant performance metrics to core frameworks used by content ads serving stack, web search, and other major product areas
• Extended performance measurement for Google-wide thread-level task executor to enable collection, reporting of high-precision queuing delays for closures and alarms
• Extended interface for widely used DAG task scheduling framework to measure queuing delays and start times of internal tasks
• Wrote analytical latency simulator to understand client behavior with high fan-out server calls
2015 — 2015
2015 — 2015
Sunnyvale, CA
• Helped reduce time needed to add email protection server to existing cluster by a factor of 5, removing the 24-machine size limit being encountered in the field
• Ported, benchmarked server management API to Go as a drop-in replacement for existing Perl implementation. Presented speed comparisons, memory savings, other results at showcase to all employees at corporate HQ
• Traced, fixed 9-year old bug within 10,000 lines of code and present in all supported servers in production. Prevented customers from being locked out of their encrypted email
• Designed, implemented API for targeted service restarts following configuration changes. Enabled faster responses from admin panel and reduced downtime
Education
University of California, Berkeley
Master of Engineering
University of California, Berkeley