# Hemanth Peddapalli

> Sr. Devops Engineer / Site Reliability Engineer

Location: Sunnyvale, California, United States
Profile: https://flows.cv/hemanth

## Work Experience
### Sr. Devops Engineer / Site Reliability Engineer @ FIS
Jan 2025 – Present | United States
•	Deployed and managed Prometheus and Grafana for system metrics and alerting, improving detection of infrastructure bottlenecks.
•	Deploy and manage containerized applications using OpenShift platform, ensuring seamless continuous integration and delivery (CI/CD) pipelines.
•	Collaborated with development teams using Git and integrated it with GCP-based CI/CD tools for automated versioning and code deployment.
•	Designed, developed, and maintained AWS Glue ETL jobs to process, transform, and load large-scale structured and semi-structured data.
•	Automated data pipelines using AWS Glue Workflows, triggers, and schedules to ensure reliable data processing.
•	Implemented CI/CD pipelines using GitHub Actions, automating build, test, and deployment processes to enhance software delivery efficiency.
•	Integrated GitHub with Jenkins to streamline automated testing and deployment workflows, improving developer productivity.
•	Designed, deployed, and managed scalable Azure cloud infrastructures using Azure Virtual Machines, Virtual Networks, and Load Balancers.
•	Designed and deployed multicluster Kubernetes environments on AWS EKS, leveraging KCP for API aggregation and workspace management. 
•	Developed custom CRDs, APIResourceSchemas, APIExports, and APIBindings to enable dynamic API discovery and integration with external providers. 
•	Automated infrastructure provisioning and configuration using Terraform and Helm for consistent, repeatable deployments. 
•	Implemented centralized logging and auditing pipelines using Fluentd, CloudWatch, and S3 for compliance and troubleshooting. 
•	Created real-time metrics collection and alerting with Prometheus, Grafana, and AWS CloudWatch to monitor platform health and resource usage. 
•	Acted as on-call SRE supporting 24/7 production workloads, handling incident triage, mitigation, and escalation.

### Sr. Site Reliability Engineer DevOps Security / Site. @ MetLife
Jan 2024 – Jan 2025 | United States
•	Designing, deploying, and managing cloud infrastructure on AWS, Azure, and Google Cloud Platform (GCP) to optimize performance, scalability, and cost-efficiency.
•	Provisioned and maintained AWS and Azure infrastructure, including EC2, S3, IAM, VPC, Azure Web Apps, Storage, and Active Directory.
•	Managed microservices with Docker, Kubernetes, OpenShift, and Azure Kubernetes Service (AKS).
•	Implemented continuous integration and delivery pipelines with tools like Git, TeamCity, Octopus, and AWS Code Pipeline.
•	Designed and implemented automated pipelines for AWS EC2 to OCI Compute instance migration, ensuring minimal downtime and optimized performance.
•	Configured Prometheus to collect real-time metrics from cloud infrastructure, applications, and services for performance monitoring.
•	Provisioned and maintained cloud resources across AWS, Azure, and GCP, including EC2, S3, IAM, VPC, Azure Web Apps, and GCP Compute Engine for scalable deployments.
•	Developed automation scripts with PowerShell, Ansible, and Chef to streamline deployment and infrastructure management. 
•	Defined and enforced SLOs/SLIs as part of the observability strategy, aligning system reliability targets with business objectives.
•	Deployed containerized applications and scaled Kubernetes clusters, enabling efficient orchestration and resource utilization.
•	Developed Infrastructure as Code (IaC) solutions using Terraform to automate provisioning of computer, networking, and storage resources in OCI.
•	Utilized Azure Recovery Vault and backups to ensure disaster recovery and data integrity.
•	Set up Prometheus Alert manager to trigger alerts based on predefined thresholds, ensuring quick incident response and resolution.
•	Proficient in using Terraform to define, provision, and manage cloud infrastructure (AWS, GCP, Azure) through code, ensuring consistent and repeatable deployment processes for scalable and secure environments.

### Site Reliability Engineer/ Devops Cloud Engineer @ Broadridge
Jan 2022 – Jan 2024 | United States
•	Expertise in Prometheus, Grafana, ELK Stack, Datadog, and CloudWatch for initiative-taking monitoring, logging, and incident response.
•	Experienced in Terraform, CloudFormation, and Ansible to automate provisioning and management of cloud resources.
•	DevOps Workflow encompassing all stages, beginning with SCM Commit Build, Integration Build Compiling.
•	Integrated monitoring and logging solutions using OCI Logging & Oracle Cloud Observability, ensuring initiative-taking issue resolution and enhanced system reliability.
•	Kernel tuning, Writing Shell scripts for system maintenance and file management. 
•	Integrated observability into CI/CD pipelines, enabling shift-left monitoring and early detection of performance regressions during deployments.
•	Experience in Chef with configuring Chef-Repo, setting up multiple Chef Workstations, and writing Chef Cookbooks and Recipes to automate the deployment process using Spinnaker and integrated with Jenkins jobs for CD framework.
•	Skilled in integrating Git repositories with CI/CD tools (e.g., Jenkins, GitLab CI) for automated build, test, and deployment pipelines, accelerating the software delivery process.
•	Developed automation scripting in Python (core) using Puppet to deploy and manage Java applications across Linux servers.
•	Utilized Datadog security monitoring features to track vulnerabilities, detect threats, and ensure compliance with industry standards.
•	Integrated Grafana with multiple data sources, including Prometheus, Elasticsearch, and Datadog, for centralized monitoring.
•	Utilized Python for data extraction, transformation, and analysis, leveraging libraries such as Pandas and NumPy to process large datasets.
•	Created scripts in Python which are integrated with Amazon API to control instance operations.
•	Integrated Prometheus with Grafana for real-time visualization and with tools like Kubernetes and Docker for enhanced container monitoring.

### Cloud Engineer / Linux, Windows Admin @ TEXCEL INFOTECH - India
Jan 2018 – Jan 2021 | Hyderabad, Telangana, India
•	I am skilled in utilizing tools such as Prometheus, Grafana, and kubectl to monitor cluster health, diagnose issues, and implement initiative-taking measures for resource optimization and application reliability in Kubernetes environments.
•	Experienced on AWS EC2, EBS, ELB scaling groups, Trusted Advisor, S3, Cloud Watch, Cloud Front, IAM, Security Groups, Auto Scaling.
•	Expertise in using Git for version control to manage and track code changes, ensuring efficient collaboration across distributed teams and maintaining a clean project history.
•	Developed a custom AWS-to-OCI security policy mapping tool, converting AWS IAM roles, policies, and security groups to OCI IAM, ensuring compliance.
•	Effectively planned and deployed hybrid Cloud infrastructure in a production environment.
•	Analyse cloud infrastructure and recommend improvements for performance gains and cost efficiency solutions.
•	Created the architecture and created the Cloud Formation template to facilitate deployment.
•	Have knowledge about Basic information about Linux OS. (File system, File configuration, Linux structure, directories.)

•	Working on Various incidents like as ESX/ESXi server Down, Data store storage issues, Vmotion, Patching, Snapshots, HA, and DRS, etc.
•	Use VMware VSphereVcenter Update Manager to apply patches to ESX, ESXi and virtual machines.
•	Maintaining Vcenter Servers, creating Virtual Machine Templates. 
•	Performing different ESX server & Virtual Machine related tasks like vMotion, Storage. VMotion, High Availability (HA), DRS (Distributed Resource Scheduling), Cloning, Snapshot. 

•	Responsible for remote administration of 2003/2008/2012 servers in domain environment.
•	Service requests: Tickets regarding changes in the infrastructure, increase of memory, hard disk, Number of CPU’s, v2v migrations, installing software.


## Education
### Master of Science - MS in Computer and Information Sciences and Support Services
Trine University

### Computer Science Engineering in Computer Science
Acharya Nagarjuna University (ANU), Guntur


## Contact & Social
- LinkedIn: https://linkedin.com/in/hemanth-peddapalli

---
Source: https://flows.cv/hemanth
JSON Resume: https://flows.cv/hemanth/resume.json
Last updated: 2026-02-23