# Shweta Sahni > SWE at LinkedIn Location: Sunnyvale, California, United States Profile: https://flows.cv/shwetasahni With over 3.5 years at LinkedIn and extensive experience in Site Reliability Engineering, I focus on building resilient systems that ensure uptime and stability across critical business services. By leveraging metrics such as MTTR and MTTD, we foster a culture of performance excellence and service reliability. Our team collaborates closely with development partners to deliver robust, scalable solutions while maintaining high operational standards. Proficient in cross-functional team leadership, observability, and auto-remediation systems, I lead initiatives that prioritize system resilience and team development. My goal is to empower teams and align engineering capabilities with organizational needs, driving reliability and innovation within LinkedIn's ecosystem. ## Work Experience ### Staff Software Engineer @ LinkedIn Jan 2026 – Present ### Staff Site Reliability Engineer, PE @ LinkedIn Jan 2021 – Present | Sunnyvale, California, United States Team Leadership and Infrastructure Management: Established and led a dedicated Dev-Embedded SRE team responsible for ensuring the stability, usability, and uptime of LinkedIn's key products and services, including Marketing, Sales, and Helpcenter teams. Promoting Site Reliability and Performance: Implemented and utilized critical metrics such as Mean Time to Recovery (MTTR), Mean Time to Detection (MTTD), and service reliability to foster a culture of site uptime and resilience within our development partner organization. Investment in Personal Development: Prioritized the professional growth of each team member, ensuring that our collective skill set remained at the forefront of engineering capabilities, enabling us to make substantial contributions to the organization. Enhancing Developer Productivity: Elevated developer productivity by implementing auto-remediation, alerting systems, and disaster recovery mechanisms. Innovative Solutions: Pioneered the development of a template-based auto-remediation system for applications hosted on the Azure cloud platform. Led a project that harnessed Generative AI to optimize end-to-end incident resolution processes. This initiative significantly improved MTTR and MTTD, further enhancing our service reliability and performance. ### Site Reliability Engineer @ Walmart eCommerce Jan 2020 – Jan 2021 | Sunnyvale, California, United States Core Principles of My Time at Walmart: Operations and Management: Oversaw daily operations and management tasks. Site Incidents: Managed various site incidents and projects. Incident Triage and Troubleshooting: Efficiently addressed and resolved incidents. System and Application Profiling: Analyzed system and application performance. Tool Architecture/Design/Development: Designed and developed essential tools. Business Continuity and Planning: Contributed to business continuity planning. Cloud Migration: Led efforts in migrating to cloud environments. Observability/monitoring: Enhanced system observability for better insights and used Grafana metrics to better monitor the system performance. Diagnostics and Alerts: Implemented diagnostics and alerting systems. Disaster Recovery: Developed robust disaster recovery solutions. Chaos Engineering: Applied chaos engineering principles. Containerization/Kubernetes: Utilized containerization for efficient deployments. Achievements and Contributions: Automation Expertise: Leveraged Python and Docker for workflow automation. Ansible Playbook Creation: Developed Ansible playbooks for streamlined operations. Recognition: Received the prestigious Make a Difference Award in 2021 for exceptional contributions to Walmart's site availability, particularly during the Thanksgiving sales period. ### DevOps Engineer @ Apple @ TEKsystems Jan 2019 – Jan 2020 | Sunnyvale, California ● Release workflow automations in Python. ● Designing and implementing CI/CD pipelines for build and deployments using Groovy. ● Code reviews and code quality analysis. ● Config changes automations using Ansible playbooks. ● Creating Docker file for java applications. ● Building Jenkins jobs to build/run Docker container clusters managed by Kubernetes. Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test, deploy. ### DevOps Engineer @ Capgemini Jan 2017 – Jan 2018 | Pune Area, India ● Performing build and deployment activities for environments like Dev, QA, Stage and Production. ● Maintaining the artifact repository called Nexus and moving the builds to it using Jenkins. ● Troubleshooting build, deployment and environment issues to make sure the environment is stable. ● Planning, scheduling and documenting releases in confluence. ● Designed and implemented Continuous Integration process using Maven and Jenkins. ● For the CI/CD process configured Jenkins pipelines from scratch and used Groovy as Declarative syntax of Jenkins file. ● Used GIT as SCM in configuring Jenkins pipelines also used GIT to store versions of packages deployed in our application. ● Used Ansible playbooks and Python/Shell scripts to automate the Infrastructure provisioning and maintain programmable infrastructure through Dev to Test to Production environments. ● Drive collaboration with the testing team to solve issues and make sure releases are shipped as planned with high quality. ● Supported developers to resolve pre- and post-deployment issues and performed RCA on issues. ● Worked on Docker containers, by using similar image across all dev, testing and staging envs. Worked on maintaining Docker Images in docker registry. ● Implemented a production ready, load balanced, highly available, fault tolerant Kubernetes infrastructure. ● Implemented Nagios tool for Monitoring and analyzing the network loads on the individual machines. ### Sr. Technology Integration Engineer @ Amdocs Jan 2013 – Jan 2017 | Pune, Maharashtra, India o Responsible for deploying applications and configuring clusters in websphere. o Developed various Shell and Perl scripts for automating Infra tasks. o Experience in configuring JDBC, JMS queues, resource adapters. o Responsible for monitoring and supporting Production activities in AT&T. o Responsible for administering applications and their maintenance on daily basis. o Providing support to system testing, E2E testing and UAT. o Resolving the complexities and bugs raised by customers efficiently and effectively on time. o Responsible for configuration of new upgrades, enhancements or any change in the existing configuration of the system. o Fixing Defects, if any. ### SCS @ Aon Hewitt Consulting (Shanghai) Co Ltd Jan 2013 – Jan 2013 | Noida, Uttar Pradesh, India ### SCS- Trainee @ Aon Hewitt Consulting (Shanghai) Co Ltd Jan 2013 – Jan 2013 | Noida, Uttar Pradesh, India ## Education ### Bachelor's of technology in Electronics and Communications Engineering Vaish College, Rohtak. ## Contact & Social - LinkedIn: https://linkedin.com/in/shweta-sahni-39056124 --- Source: https://flows.cv/shwetasahni JSON Resume: https://flows.cv/shwetasahni/resume.json Last updated: 2026-04-12