Specialties: Distributed Systems, Nomad, Consul, Web Apps, Identity Management, LDAP, Kerberos
Experience
2019 — Now
2015 — 2019
2015 — 2019
Palo Alto, CA
Team lead and architect on new mission critical global monitoring platform:
• Designed and developed horizontally scalable and highly available distributed monitoring systems across production data centers using Nagios, Consul, and Python
• Eliminated global and datacenter level single points of failure.
• Enable horizontal scaling of monitoring traffic by simply spinning up new instances of monitoring server. New instance will join group and automatically configure itself
• Worked with Operations to replace legacy single point of failure monitoring system with new distributed monitoring system across datacenters around the globe
• Failure rate dropped from at least once a month to 0 in the year after roll out
Individual contributor on production platform team:
• Created a package deployment visualization tool to monitor the progress of package rollouts and monitor health of hosts using Python, Flask, SQLAlchemy, Sqlite, Postgres, JQuery, Bootstrap, and Puppet API
• Created tool to cluster hosts based on various attributes using machine learning using Python and Scikit Learn's Affinity Propagation functions
• Prototyped an alerts deduplication system using Riemann and Clojure
• Created a continuous integration and deployment pipeline built around Nomad + Docker and Jenkins and a Nomad orchestrator written in Go
Team lead for internal infrastructure development
• Helped roll out Gerrit and pre-checkin code verification into development process
• Improved code verification and release process
• Wrote plans and created architecture to improve verification, release, and deployment process
• Deployed Docker and integrated it into development process for containment, repeatability, and higher parallelism
• Wrote a lightweight API proxy to aggregate Docker hosts to improve availability and throughput using Python and Flask
• Reduced verification time from over an hour to around 10 minutes
• Reduced release cycle from 2 to 3 weeks to 2 to 3 days
2007 — 2015
2007 — 2015
Engineering Lead for vCloud Director 1.5
o Managed a combined group of 40 developers and QA across 3 geographies
o Released and shipped on time while reaching all the feature and bug fix milestones
Created a service to automatically select relevant test cases to run against a change set based on changes made to the code using Python, Java, Redis, and ObjectWeb’s ASM libraries
o Yields a 20% reduction on testing time on average and up to 90% in some cases
o Have the ability to know when code change has no test coverage and warn the engineer
o Able to collect individual test case code coverage data and run analysis on each test case to improve test quality
Created an automatic bug filing service to examine automated testing logs and file bugs against the appropriate component using Python, SQLAlchemy, and Postgres
Created a scalable, redundant distributed system to provision virtual machines in advance of workloads using Python, RabbitMQ, graphite, and Redis
o Reduced automated testing time from 40 minutes to 5 minutes and test environment provisioning time to zero
o Effectively eliminated infrastructure failure as a source of automated test failures
o Automatically scales up or down based on predicted load using historical data
Wrote Ruby SDK for vCloud Director and used it to enable Cloud Foundry's BOSH to run on vCloud Director
Designed and implemented new users and groups management system for vCloud Director 1.0
Engineering Lead for Lab Manager 3.0.1
Web UI development and new LDAP implementation to support Microsoft Active Directory and OpenLDAP for Lab Manager 3.0.
Web UI development for Lab Manager 2.5.
2004 — 2007
2004 — 2007
Rolled out Sharepoint 2007. Used Google Search Appliance to search the wiki and developed a component to integrate GSA searching into Sharepoint.
Developed new equipment inventory system using .Net Framework/C# and Microsoft SQL Server 2000.
Developed program to monitor TIBCO bridges to satellite offices.
MCSA/MCSE certified.
Education
Yale University