# Hemanth Peddapalli > Sr. Devops Engineer / Site Reliability Engineer Location: Sunnyvale, California, United States Profile: https://flows.cv/hemanth ## Work Experience ### Sr. Devops Engineer / Site Reliability Engineer @ FIS Jan 2025 – Present | United States • Deployed and managed Prometheus and Grafana for system metrics and alerting, improving detection of infrastructure bottlenecks. • Deploy and manage containerized applications using OpenShift platform, ensuring seamless continuous integration and delivery (CI/CD) pipelines. • Collaborated with development teams using Git and integrated it with GCP-based CI/CD tools for automated versioning and code deployment. • Designed, developed, and maintained AWS Glue ETL jobs to process, transform, and load large-scale structured and semi-structured data. • Automated data pipelines using AWS Glue Workflows, triggers, and schedules to ensure reliable data processing. • Implemented CI/CD pipelines using GitHub Actions, automating build, test, and deployment processes to enhance software delivery efficiency. • Integrated GitHub with Jenkins to streamline automated testing and deployment workflows, improving developer productivity. • Designed, deployed, and managed scalable Azure cloud infrastructures using Azure Virtual Machines, Virtual Networks, and Load Balancers. • Designed and deployed multicluster Kubernetes environments on AWS EKS, leveraging KCP for API aggregation and workspace management. • Developed custom CRDs, APIResourceSchemas, APIExports, and APIBindings to enable dynamic API discovery and integration with external providers. • Automated infrastructure provisioning and configuration using Terraform and Helm for consistent, repeatable deployments. • Implemented centralized logging and auditing pipelines using Fluentd, CloudWatch, and S3 for compliance and troubleshooting. • Created real-time metrics collection and alerting with Prometheus, Grafana, and AWS CloudWatch to monitor platform health and resource usage. • Acted as on-call SRE supporting 24/7 production workloads, handling incident triage, mitigation, and escalation. ### Sr. Site Reliability Engineer DevOps Security / Site. @ MetLife Jan 2024 – Jan 2025 | United States • Designing, deploying, and managing cloud infrastructure on AWS, Azure, and Google Cloud Platform (GCP) to optimize performance, scalability, and cost-efficiency. • Provisioned and maintained AWS and Azure infrastructure, including EC2, S3, IAM, VPC, Azure Web Apps, Storage, and Active Directory. • Managed microservices with Docker, Kubernetes, OpenShift, and Azure Kubernetes Service (AKS). • Implemented continuous integration and delivery pipelines with tools like Git, TeamCity, Octopus, and AWS Code Pipeline. • Designed and implemented automated pipelines for AWS EC2 to OCI Compute instance migration, ensuring minimal downtime and optimized performance. • Configured Prometheus to collect real-time metrics from cloud infrastructure, applications, and services for performance monitoring. • Provisioned and maintained cloud resources across AWS, Azure, and GCP, including EC2, S3, IAM, VPC, Azure Web Apps, and GCP Compute Engine for scalable deployments. • Developed automation scripts with PowerShell, Ansible, and Chef to streamline deployment and infrastructure management. • Defined and enforced SLOs/SLIs as part of the observability strategy, aligning system reliability targets with business objectives. • Deployed containerized applications and scaled Kubernetes clusters, enabling efficient orchestration and resource utilization. • Developed Infrastructure as Code (IaC) solutions using Terraform to automate provisioning of computer, networking, and storage resources in OCI. • Utilized Azure Recovery Vault and backups to ensure disaster recovery and data integrity. • Set up Prometheus Alert manager to trigger alerts based on predefined thresholds, ensuring quick incident response and resolution. • Proficient in using Terraform to define, provision, and manage cloud infrastructure (AWS, GCP, Azure) through code, ensuring consistent and repeatable deployment processes for scalable and secure environments. ### Site Reliability Engineer/ Devops Cloud Engineer @ Broadridge Jan 2022 – Jan 2024 | United States • Expertise in Prometheus, Grafana, ELK Stack, Datadog, and CloudWatch for initiative-taking monitoring, logging, and incident response. • Experienced in Terraform, CloudFormation, and Ansible to automate provisioning and management of cloud resources. • DevOps Workflow encompassing all stages, beginning with SCM Commit Build, Integration Build Compiling. • Integrated monitoring and logging solutions using OCI Logging & Oracle Cloud Observability, ensuring initiative-taking issue resolution and enhanced system reliability. • Kernel tuning, Writing Shell scripts for system maintenance and file management. • Integrated observability into CI/CD pipelines, enabling shift-left monitoring and early detection of performance regressions during deployments. • Experience in Chef with configuring Chef-Repo, setting up multiple Chef Workstations, and writing Chef Cookbooks and Recipes to automate the deployment process using Spinnaker and integrated with Jenkins jobs for CD framework. • Skilled in integrating Git repositories with CI/CD tools (e.g., Jenkins, GitLab CI) for automated build, test, and deployment pipelines, accelerating the software delivery process. • Developed automation scripting in Python (core) using Puppet to deploy and manage Java applications across Linux servers. • Utilized Datadog security monitoring features to track vulnerabilities, detect threats, and ensure compliance with industry standards. • Integrated Grafana with multiple data sources, including Prometheus, Elasticsearch, and Datadog, for centralized monitoring. • Utilized Python for data extraction, transformation, and analysis, leveraging libraries such as Pandas and NumPy to process large datasets. • Created scripts in Python which are integrated with Amazon API to control instance operations. • Integrated Prometheus with Grafana for real-time visualization and with tools like Kubernetes and Docker for enhanced container monitoring. ### Cloud Engineer / Linux, Windows Admin @ TEXCEL INFOTECH - India Jan 2018 – Jan 2021 | Hyderabad, Telangana, India • I am skilled in utilizing tools such as Prometheus, Grafana, and kubectl to monitor cluster health, diagnose issues, and implement initiative-taking measures for resource optimization and application reliability in Kubernetes environments. • Experienced on AWS EC2, EBS, ELB scaling groups, Trusted Advisor, S3, Cloud Watch, Cloud Front, IAM, Security Groups, Auto Scaling. • Expertise in using Git for version control to manage and track code changes, ensuring efficient collaboration across distributed teams and maintaining a clean project history. • Developed a custom AWS-to-OCI security policy mapping tool, converting AWS IAM roles, policies, and security groups to OCI IAM, ensuring compliance. • Effectively planned and deployed hybrid Cloud infrastructure in a production environment. • Analyse cloud infrastructure and recommend improvements for performance gains and cost efficiency solutions. • Created the architecture and created the Cloud Formation template to facilitate deployment. • Have knowledge about Basic information about Linux OS. (File system, File configuration, Linux structure, directories.) • Working on Various incidents like as ESX/ESXi server Down, Data store storage issues, Vmotion, Patching, Snapshots, HA, and DRS, etc. • Use VMware VSphereVcenter Update Manager to apply patches to ESX, ESXi and virtual machines. • Maintaining Vcenter Servers, creating Virtual Machine Templates. • Performing different ESX server & Virtual Machine related tasks like vMotion, Storage. VMotion, High Availability (HA), DRS (Distributed Resource Scheduling), Cloning, Snapshot. • Responsible for remote administration of 2003/2008/2012 servers in domain environment. • Service requests: Tickets regarding changes in the infrastructure, increase of memory, hard disk, Number of CPU’s, v2v migrations, installing software. ## Education ### Master of Science - MS in Computer and Information Sciences and Support Services Trine University ### Computer Science Engineering in Computer Science Acharya Nagarjuna University (ANU), Guntur ## Contact & Social - LinkedIn: https://linkedin.com/in/hemanth-peddapalli --- Source: https://flows.cv/hemanth JSON Resume: https://flows.cv/hemanth/resume.json Last updated: 2026-02-23