•Resolved 100+ incidents in on-call rotations, improving mean time to recovery (MTTR) and participated in RCA meetings to prevent recurrence.
•Developed and deployed Python & Bash scripts, reducing manual infrastructure maintenance by 80% and optimizing system reliability.
•Wrote Terraform code to deploy and manage AWS infrastructure, including EC2, Aurora RDS, and ElastiCache, ensuring security team compliance with latest standards. Optimized AWS resource utilization, reducing over-provisioning, leading to 50% cost savings in cloud spend.
•Collaborated with the developers to troubleshoot Kubernetes clusters, such as pods scaling and debugging, reducing error rates by 20%.