# Sunil Kapil > Platform / Backend Engineer | AI Platform | Cloud Infrastructure | Distributed Systems Location: Sunnyvale, California, United States Profile: https://flows.cv/sunilkapil I’m a platform and backend engineer focused on building scalable cloud systems that teams can operate with confidence. My work centers on distributed services, cloud infrastructure, and the foundations of observability—logs/metrics pipelines, monitoring and alerting, and automation that reduces incident time and operational toil. At Amazon, I work on log/metrics and monitoring platforms supporting large-scale device operations. I care a lot about turning telemetry into fast diagnosis, and I’ve been applying AI to operational workflows (for example, AI-assisted triage and insights on top of logs/metrics integrated into real team workflows). Previously, I worked on internal platform and cloud enablement at Walmart Global Tech and led payments/platform infrastructure at Instawork. I enjoy high-ownership roles where I can set technical direction, align stakeholders, and deliver platforms that multiple teams depend on. Focus areas: Platform/Infrastructure Engineering • Backend/Distributed Systems • Cloud (AWS/GCP/Azure) • Observability/Monitoring • AI for Operations (AI Platform workflows) ## Work Experience ### Backend and Platform (AI/Infrastructure) @ Amazon Jan 2024 – Present | Sunnyvale, California, United States I own and continuously improve the Log, Metrics, and Monitoring platform for Amazon’s device organization, delivering observability for millions of devices and enabling thousands of developers and QA workflows daily. My focus is building scalable telemetry infrastructure (logs + metrics), reliable querying and dashboards, and monitoring/alerting systems that improve fleet reliability and accelerate incident response. Key contributions include scaling the end-to-end log and metrics pipeline (ingestion → storage/indexing → query → dashboards) to make debugging faster and more dependable; designing real-time device health monitoring and alerting with fleet-wide automation for operational tasks; and architecting an AI-assisted log triage system using Amazon Bedrock, RAG, and Claude—integrated with Slack, internal web portals, and paging workflows—adopted org-wide and reducing MTTR for critical device incidents. I also led a multi-service migration to a left-shift deployment strategy from the DUB region, improving release velocity and reducing production incidents across the organization. ### Staff Software Engineer Platform and Cloud Infrastructure @ Walmart Global Tech Jan 2024 – Jan 2024 | Sunnyvale, California, United States internal systems. My work spans backend services, cloud infrastructure, and CI/CD automation, with an emphasis on operational excellence (monitoring, alerting, incident response) and scalable architecture. Key contributions include improving telemetry pipelines and dashboards for faster root-cause analysis; designing resilient service patterns and deployment workflows to reduce production incidents; and partnering across teams to drive standards for reliability, automation, and cost/performance efficiency across shared infrastructure. ### Lead Engineer (Platform/Infrastructure) @ Instawork Jan 2021 – Jan 2023 At Instawork, I led payments and platform engineering and drove multiple high-impact initiatives across product, infrastructure, and risk. I designed and launched a fully automated payments system (including Instawork debit cards) that improved payout efficiency and significantly reduced manual operations and support overhead. In parallel, I spearheaded an infrastructure transformation on AWS by establishing Infrastructure-as-Code practices with Terraform—building the foundations, team practices, and delivery standards from the ground up. I also led Trust & Safety efforts across multiple projects to strengthen payment integrity and ensure secure, on-time processing for customers at scale. ### Software Engineer @ Chegg Inc. Jan 2014 – Jan 2021 Led architecture and delivery of a real-time writing platform on AWS using event-driven patterns. Built scalable APIs and backend services that enabled seamless third-party integrations and supported millions of users with high availability and low latency. Modernized billing and payments across both real-time and subscription workflows. Strengthened risk controls and automation, improving reliability end-to-end and reducing payment fraud to near zero. Built and scaled engineering teams from the ground up. Set technical direction, mentored engineers, established delivery and quality standards, and drove execution across multiple product lines in partnership with cross-functional stakeholders. ### Software Engineer @ Qualcomm, Samsung Electronics, HCL Technologies Jan 2007 – Jan 2014 Earlier in my career, I worked across backend platforms, performance engineering, and automation at Qualcomm, Samsung Electronics, and HCL Technologies. I built performance benchmarking frameworks for WebKit/Chromium/V8, and developed automation and interoperability testing infrastructure across Android, mobile, and web platforms. I also built backend APIs for payments and search in a high-scale product environment. ## Education ### Bachelor of Technology ITM Dehradun ## Contact & Social - LinkedIn: https://linkedin.com/in/snlkapil --- Source: https://flows.cv/sunilkapil JSON Resume: https://flows.cv/sunilkapil/resume.json Last updated: 2026-04-01