# Karthikeyan Thangaraj

> Staff Software Engineer at Walmart Global Tech

Location: Sunnyvale, California, United States
Profile: https://flows.cv/karthikeyanthangaraj

A principal production engineer with hands-on experience in the field of Devops spanning cloud computing, large-scale production web operations and distributed systems, Linux system administration,  application support and automation.

## Work Experience
### Staff Software Engineer @ Walmart Global Tech
Jan 2021 – Present | Sunnyvale, California, United States

### Principal Production Engineer, Core Infrastructure @ Yahoo
Jan 2019 – Jan 2021 | Sunnyvale, California, United States
Cloud Infrastructure - Managed and operated Yahoo’s largest on prem kubernetes clusters with ~7300 nodes across 7 data centers hosting ~100k application containers
Service Mesh - Built, Managed and operated Istio Opensource Service Mesh to provide uniform way to connect, secure and monitor communication in microservices architecture
Load Balancing - Built, Managed and operated Highly Reliable, Available and Scalable Software L4 Load balancing for kubernetes services
Traffic Management - Built, Managed and operated web proxies(ATS/Envoy) handling 2M RPS
Observability - Managed Monitoring Infrastructure Platform to achieve High Availability

### Sr Service Engineer @ Yahoo
Jan 2015 – Jan 2019 | Sunnyvale, California

### Sr Service Engineer @ Yahoo
Jan 2011 – Jan 2015 | Bangalore
Spearheaded the reliability, scalability, and performance of high-traffic web  platforms, including www.yahoo.com and my.yahoo.com, serving millions of global users daily.
Designed and maintained resilient, distributed systems to support yahoo frontpage and personalization experiences at scale.
Improved system uptime and reduced latency through proactive monitoring, alerting, and performance tuning.
Built and enhanced CI/CD pipelines to enable faster, safer, and more frequent deployments.
Collaborated cross-functionally with product, data, and frontend teams to deliver personalized user experiences.
Led incident response, root cause analysis, and postmortems to continuously improve system stability.
Automated infrastructure provisioning and operational workflows, reducing manual effort and increasing efficiency.
Implemented observability solutions (metrics, logging, tracing) to gain deep insights into system behavior.
Contributed to capacity planning and scaling strategies to handle peak traffic events.
Championed best practices in reliability engineering, including SLIs/SLOs and fault tolerance.


## Contact & Social
- LinkedIn: https://linkedin.com/in/karthikeyan-thangaraj

---
Source: https://flows.cv/karthikeyanthangaraj
JSON Resume: https://flows.cv/karthikeyanthangaraj/resume.json
Last updated: 2026-04-12