# Karthikeyan Thangaraj > Staff Software Engineer at Walmart Global Tech Location: Sunnyvale, California, United States Profile: https://flows.cv/karthikeyanthangaraj A principal production engineer with hands-on experience in the field of Devops spanning cloud computing, large-scale production web operations and distributed systems, Linux system administration, application support and automation. ## Work Experience ### Staff Software Engineer @ Walmart Global Tech Jan 2021 – Present | Sunnyvale, California, United States ### Principal Production Engineer, Core Infrastructure @ Yahoo Jan 2019 – Jan 2021 | Sunnyvale, California, United States Cloud Infrastructure - Managed and operated Yahoo’s largest on prem kubernetes clusters with ~7300 nodes across 7 data centers hosting ~100k application containers Service Mesh - Built, Managed and operated Istio Opensource Service Mesh to provide uniform way to connect, secure and monitor communication in microservices architecture Load Balancing - Built, Managed and operated Highly Reliable, Available and Scalable Software L4 Load balancing for kubernetes services Traffic Management - Built, Managed and operated web proxies(ATS/Envoy) handling 2M RPS Observability - Managed Monitoring Infrastructure Platform to achieve High Availability ### Sr Service Engineer @ Yahoo Jan 2015 – Jan 2019 | Sunnyvale, California ### Sr Service Engineer @ Yahoo Jan 2011 – Jan 2015 | Bangalore Spearheaded the reliability, scalability, and performance of high-traffic web platforms, including www.yahoo.com and my.yahoo.com, serving millions of global users daily. Designed and maintained resilient, distributed systems to support yahoo frontpage and personalization experiences at scale. Improved system uptime and reduced latency through proactive monitoring, alerting, and performance tuning. Built and enhanced CI/CD pipelines to enable faster, safer, and more frequent deployments. Collaborated cross-functionally with product, data, and frontend teams to deliver personalized user experiences. Led incident response, root cause analysis, and postmortems to continuously improve system stability. Automated infrastructure provisioning and operational workflows, reducing manual effort and increasing efficiency. Implemented observability solutions (metrics, logging, tracing) to gain deep insights into system behavior. Contributed to capacity planning and scaling strategies to handle peak traffic events. Championed best practices in reliability engineering, including SLIs/SLOs and fault tolerance. ## Contact & Social - LinkedIn: https://linkedin.com/in/karthikeyan-thangaraj --- Source: https://flows.cv/karthikeyanthangaraj JSON Resume: https://flows.cv/karthikeyanthangaraj/resume.json Last updated: 2026-04-12