Edinburgh, Scotland, United Kingdom
I Designed and implemented tooling in python to create automated randomized chaos scenarios to test reliability and resilience on hyper-scale Azure Kubernetes deployments.
The tooling was designed to be as user friendly as possible while also giving the user a deep level of control over the tests that were run. The primary focus of the tool was to allow for testers to inject faults such as killing pods, introducing latency in the network and blocking access to specific Azure Services in a standardized manner that abstracted away its implementation.
Furthermore, in the interest of ensuring the tool was extensible, it was designed and documented with methods to integrate the chaos tooling with external tools and tests for further project-specific reliability testing integrations ensuring 99.999% project up-time.
Received return offer.