United States
Identified and integrated GoAlert, which is an open-source incident management
tool, to enhance on-call workflows, improve incident handling, and reduce mean
time to resolution (MTTR).
United States
Owned the incident response process and led a team of engineers to ensure its efficiency and effectiveness, resulting in a 30% reduction in MTTR, MTTD, and MTTA
Conducted regular gamedays to identify gaps in the incident response process and automated it using Rootly, resulting in a 40% increase in incident resolution speed.
Designed and implemented an operational readiness review (ORR) for microservices, including service tiering and scorecards for operational readiness checks before production deployment, resulting in a 50% reduction in production incidents
Spearheaded technical Excellence Program, significantly improving engineering standards and service reliability. Achieved this by frequently analyzing key metrics, including incident rates, bug tracking, user experience, SLO compliance for latency and availability, error budgets, and burnout rates.
Identify Critical User Journeys.
Developed service catalog solution using terraform modules
Enforce Datadog synthetics for identified flows.
Consolidate Saas tools to save costs.
Created Slack applications to automate incident response processes, enabling users to trigger PagerDuty incidents directly from Slack.
Automated and built PagerDuty modules to support on-call schedules and maintain PagerDuty resources.
2021 — 2022
United States
Deploy infrastructure resources using pulumi typescript .
Setting up GitHub actions workflow for continuous deployment.
Configure Dependabot in github to Keep all packages up to date.
Enforced standards for Kubernetes manifests, ensuring that all manifests met company best practices for
security and scalability
static code analysis for infrastructure as code (IaC) and also a software composition analysis for docker
images and open source packages.
Design and Implement production access control for eks clusters.
Write lambda functions to update dynamodb tables and aws-auth ConfigMap.
Write test cases for in Golang using the Ginkgo testing framework:
Participate in follow the sun on call rotation for production infrastructure.
Automate eks gke upgrades using pulumi.
Collaborate with development teams to integrate Pulumi into their development processes, improving the
overall speed and efficiency of infrastructure deployments
2018 — 2021
United States
Design, deployment, support and maintenance of opensource AWX HA-cluster.
Authentication & Authorization .
Design, Deployment, support and maintenance of Elastic-search on Kubernetes.
Deployment, support and maintenance of Rabbitmq on Kubernetes.
Enforcing custom policies on Kubernetes objects using OPA.
12-Factor App Guidelines
Observability stack.
Cross DC replication.
Enterprise Mesosphere cluster.
Private Docker registry.
Implement disaster recovery (DR) between different region Data-centers.
Blue-green deployment with zero downtime.
Haproxy rate limiting
Local universe for DC/OS.
EKS cluster with terraform.
Lambda function.
Kubernetes kustomize.
2017 — 2018
Sunnyvale, California, United States
Deploying private cloud solution using enterprise DC/OS.
DC/OS installtion 1.10 and higher.
Containerize java based apps using docker.
Automation of enterprise DC/OS deployment using ansible.
Spark,HDFS,KAFKA framework on DCOS
Education
2013 — 2015
SUNY Polytechnic Institute
Master’s Degree
2013 — 2015
2009 — 2013
JNTUH College of Engineering Hyderabad
Bachelor’s Degree
2009 — 2013