Experience
San Jose, California, United States
Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud
infrastructures, with a focus on site reliability and cloud managed platform, covering big data
computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases.
Participate in and enhance the complete service lifecycle, from inception and design, through
development, capacity planning, launch reviews, deployment, operation, and refinement.
Design and implement software platforms and monitoring frameworks to govern service-oriented
architecture (SOA) efficiently, automatically, and intelligently.
Develop and manage components of cloud-managed data infrastructure, encompassing technologies
such as Kubernetes, Redis, MySQL, Flink, and more
San Jose, California, United States
Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud
infrastructures, with a focus on site reliability and cloud managed platform, covering big data
computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases.
Participate in and enhance the complete service lifecycle, from inception and design, through
development, capacity planning, launch reviews, deployment, operation, and refinement.
Design and implement software platforms and monitoring frameworks to govern service-oriented
architecture (SOA) efficiently, automatically, and intelligently.
Develop and manage components of cloud-managed data infrastructure, encompassing technologies
such as Kubernetes, Redis, MySQL, Flink, and more
Santa Clara, California, United States
✧ Capacity Planning: Manually led capacity planning for multiple micro-services across multiple production stacks
Built capacity reporting tool to identify VMs/Hosts with overprovisioned CPUs, and memory on underlying KVM. Automation helped in fixing major performance bottlenecks and saving 30% in infrastructure costs
Led cross-team efforts on a tenant onboarding project, employing T-shirt sizing methodology, leading to streamlined capacity planning
✧ Monitoring & Alerting: Collaborated with cross-functional teams to understand complex application architectures and implement effective top-down monitoring strategies, resulting in improved service visibility, reduced MTTD, and proactive issue resolution
✧ Infrastructure & Automation: Developed IaaC libraries for provisioning and operating infrastructure at a massive scale using Terraform
Implemented Noname WAAF across Netskope to increase visibility to our web access firewall
✧ CI/CD: Enhanced existing Deployment Jenkins Pipelines to reduce overall deployment time from 12 to 3 hrs across multiple stacks
Implemented Spinnaker as CI/CD solution for faster release churn, rollbacks, and canary for k8s native supported infrastructure
✧ Onboarding: Led system designs and features to improve availability, scalability, latency, and efficiency of multiple microservices
Embedded with product teams to ensure that applications are production-ready, scalable, and reliable
Mentored newly onboarded team members on design principles, documentation efforts, troubleshooting production application services, and SRE best practices
Led incidents post-mortem to identify root cause, ensure remediation, and further identify measures to curb the future repetition of the issues
Introduced and streamlined processes for on-call and incident management
2019 — 2020
Santa Clara, California
✧ Monitoring & Alerting: Created service monitoring dashboards, actionable incident alerts, comprehensive Runbooks
✧ CI/CD: Developed ansible CD pipeline to deploy packages across multiple microservices, reducing deployment time from 20 to 12 hrs
✧ On-call: Worked on 12/7 production on-call for a large fleet of hosts, monitoring host/app health, triaging/resolving errors on the application and host level, identifying and disabling faulty applications/features, leveraging SRE tools and automation, mitigating outages
Reviewed and approved PRDs for new services and managed new services as they were onboarded for SRE support
Built an automotive system to poll data from different SAAS apps and inject data into the production environment
Education
University of Southern California
Master of Science - MS
Delhi University