# Sarthak Gupta > AI Infra SWE @ SoftBank • IIT Kanpur Location: United States, United States Profile: https://flows.cv/sarthakgupta > SWE focused on AI Infrastructure and ML Platforms with 5+ years building cloud-native, distributed systems that power ML training, deployment, and real-time inference in production. > I specialize in ML platform engineering that improves developer velocity and system reliability—automating the ML lifecycle from experimentation through production. > I’ve led end-to-end delivery of scalable backend services and data pipelines supporting high-throughput, mission-critical workloads. > I partner closely with ML engineers and cross-functional teams to productionise models safely, improve performance, and harden systems for scale. Core focus = AI Infrastructure | ML Platforms | Distributed Systems | Backend Infra | Kubernetes | CI/CD Automation ## Work Experience ### SWE - AI Infrastructure @ SoftBank Jan 2025 – Present | Sunnyvale, CA ◦ [ Overview ] : Building out the AI Infrastructure and ML Platform to make the GPUs of SoftBank go brrrr....efficiently and scalably! ◦ [ ML Deployment Infrastructure] : orchestrated deployment of distributed ML infrastructure from scratch by engineering a command line tool; automated Kubernetes provisioning and multi-service deployment on bare metal; reduced cluster bring-up time from 1 hour to under 5 mins ◦ [ ML Monitoring ] : engineered ML platform’s observability stack by deploying Prometheus kubernetes stack, established ”dashboards-as-code” framework in Grafana, providing real-time monitoring; cut MTTR by 80% ◦ [ ML Test Infra] : architected end-to-end integration test pipeline using Github Actions and self-hosted runners, enabled on demand testing of feature branches against live infrastructure; reduced full regression test cycle from 3+ hours of manual validation to 15 min automated run ### SWE - ML Platform ( IPO in June 2025 ) @ Chime Jan 2024 – Jan 2025 | Dallas-Fort Worth Metroplex ◦ [ Overview ] : Built end-to-end ML Infra pipeline for personalised referral incentive model system, which is largest driver of growth for company, with 72k new enrollments per quarter ◦ [ ML Platform ] : developed a config-driven ML platform framework (ML Kit) that centralized model development pipelines, reducing overall build time from 4 weeks to 1 week and cutting 57 weeks of annual maintenance. ◦ [ ML Infrastructure ] : Integrated Arize AI into existing microservices architecture for real-time model/feature drift monitoring, enabling faster detection and RCA of issues : reducing mean time to resolution by 80%. ◦ [ CI / CD Automation ] : Developed CI/CD pipelines (CircleCI, Terraform) to automate container deployments for model serving. Reduced manual overhead by 50% and improved system reliability. ◦ [ ML Training ] : implemented XGBoost model training pipelines, optimized hyperparameters, improved referral model performance, reducing customer acquisition costs by 4x ### SWE Intern - Backend Infrastructure (acquired by Siemens for $10B in March 2025) @ Altair Jan 2023 – Jan 2023 ◦ [ Overview ] : Developed full-stack product publishing app to automate and streamline on-boarding of new products to online marketplace of company; reduced average on-boarding time by 50% ◦ [ Backend Infra ] : Engineered robust backend infrastructure to seamlessly integrate with marketplace’s existing systems; created efficient REST APIs to communicate with the frontend; reduced integration time by 25% ◦ [ Databases ] : Designed a scalable and flexible NoSQL database schema to store expanding product inventory and varying product attributes; improved data retrieval speed by 33% ### SWE - Machine Learning ($50M raised, AI in Healthcare) @ SigTuple Jan 2018 – Jan 2021 | Bengaluru, Karnataka, India ◦ [ Overview ] : Built end-to-end AI system leveraging Computer Vision and Deep learning to scan and analyze microscopic images of blood/ urine medical samples, and generating pathology reports of patients in just 10 minutes ◦ [ MLOps ] : spearheaded ML engineering team, led product from a nascent prototype to FDA 510(k) cleared in 2.5 yrs ◦ [ ML Deployment ] : Led the redesign of a monolithic platform into containerized microservices (Docker/Kubernetes), reducing release cycles by 60% and cutting compute costs by 40% ◦ [ Data Pipeline ] : Implemented a robust data ingestion pipeline to handle and preprocess high-resolution medical images at scale, ensuring near real-time analysis. ◦ [ ML Research ] : invented novel model training methodology, domain transformation of real data into pre-labelled, synthetic data; reduced training-data creation time by 35% and sped up product development lifecycle by 2.5 times ◦ [ ML Modelling ] : converted existing cumbersome extraction+classification model pipeline to a single-shot streamlined object localisation model pipeline; raised model performance to 95% and decreased inference time by 45% ## Education ### Bachelor of Technology - BTech in Materials Science & Engineering Indian Institute of Technology, Kanpur ### Master of Science - MS in Computer Science The University of Texas at Dallas ### Post Graduate Diploma in Data Science International Institute of Information Technology Bangalore ## Contact & Social - LinkedIn: https://linkedin.com/in/sarthak-gpt --- Source: https://flows.cv/sarthakgupta JSON Resume: https://flows.cv/sarthakgupta/resume.json Last updated: 2026-04-11