# SaiKruthi Wusirika > Senior Software Engineer – AI Infra & Distributed Systems | LLM Serving, GPU Orchestration, Kubernetes, Observability, Open Source Contributer Location: San Francisco Bay Area, United States Profile: https://flows.cv/saikruthi I build the infrastructure that powers frontier AI — from large-scale LLM serving on multi-node GPU clusters to cloud-native Kubernetes platforms handling mission-critical production workloads. As a Senior AI Infrastructure Engineer at Multiscale AI, I design and operate the systems where uptime, throughput, and reliability are non-negotiable. I'm especially excited about building robust, large-scale infrastructure for frontier models, and I'm actively open to roles at AI labs working on safety-critical and high-impact systems. Selected Impact: • 4x LLM serving throughput and 60% lower P99 latency on H100 clusters via disaggregated prefill/decode (vLLM, KubeRay, Go) • Reduced idle GPU spend by ~$200K/month with fair-share, auto-scaling GPU scheduling (Kueue, KEDA, fractional GPU partitioning) • Cut deployment time from 2 days to 15 minutes with a multi-tenant Kubernetes IDP serving multiple engineering teams • Built data platforms querying billions of rows with sub-second latency, reducing data discovery time by 90% • Standardized end-to-end observability for AI agents using OpenTelemetry, Prometheus, and high-cardinality metrics Core: Distributed systems · Large-scale LLM serving · GPU clusters · Kubernetes · Observability · ML infrastructure Tech: Go · Python · Rust · C++ · Kubernetes (CKA) · vLLM · Triton · KubeRay · NVIDIA GPU Operator · Kueue · KEDA · ArgoCD · Terraform · Prometheus · Grafana · OpenTelemetry · Spark · Kafka · MLflow · AWS · GCP ## Work Experience ### Senior Software Engineer @ Multiscale AI Jan 2024 – Present | Remote Lead engineer for large-scale LLM serving and GPU infrastructure powering tier-1 customers across multi-cloud Kubernetes platforms. • Designed and operated large-scale LLM serving pipelines using disaggregated prefill/decode on multi-node H100 clusters (vLLM, Go, KubeRay), improving P99 latency by 60% and 4x-ing throughput for mission-critical production workloads • Built fair-share GPU scheduling across shared clusters using Kueue, KEDA, and fractional GPU partitioning, cutting idle GPU cost by ~$200K/month while maintaining 99.9% job success and cluster availability • Designed and deployed multi-tenant, cloud-native Kubernetes platform (AppHub IDP) with GitOps (ArgoCD, Tekton), service mesh, and disaster recovery; cut deployment time from 2 days to 15 minutes across multiple engineering teams • Standardized end-to-end observability for AI agents using OpenTelemetry, Prometheus, and high-cardinality metrics, enabling sub-second root cause analysis of distributed failures and improving incident MTTR • Built MLflow-backed experiment and model infrastructure on Kubernetes (PostgreSQL, MinIO) as a shared platform, automating deployments via Go CLIs and Terraform across AWS, GCP, and Azure • Contributed open-source patches to NVIDIA Dynamo, improving distributed inference throughput and memory efficiency on A100/H100 GPU clusters • Led ISO/IEC 27001 and SOC2 Type II compliance programs, architecting security controls across cloud infrastructure ### Software Engineer Intern @ Multiscale AI Jan 2023 – Jan 2024 | Seattle, Washington, United States • Developed and benchmarked distributed inference systems using Triton Inference Server, Kubeflow, and Kubernetes, achieving significant throughput improvements for LLM serving workloads in HPC environments • Built RAG (Retrieval-Augmented Generation) pipelines using LangChain and LLMs, enabling semantic search over enterprise knowledge bases with sub-2-second query latency • Deployed and managed multi-node GPU clusters using Slurm workload manager and HPCC systems, running MPI-based distributed training jobs with NVIDIA cuDNN and CUDA optimization • Implemented end-to-end MLOps pipelines using Apache Airflow and MLflow, automating model training, validation, and deployment workflows on AWS and GCP • Architected Nginx-based reverse proxy and service mesh configurations with Kubernetes, improving API gateway performance and enabling zero-downtime deployments • Contributed to internal developer platform (IDP) tooling using Helm Charts and YAML-based Kubernetes operators, reducing onboarding time for new engineering teams ### Software Engineer @ Wipro Limited Jan 2019 – Jan 2022 | Bengaluru, Karnataka, India • Designed and maintained distributed microservices architecture using Java, Python, and Scala on Kubernetes, supporting enterprise clients across banking and financial services sectors • Built and optimized CI/CD pipelines using Jenkins, Docker, and Terraform on AWS and Azure, reducing deployment time by automating infrastructure provisioning and release management • Implemented container orchestration solutions using Kubernetes and Docker Swarm, migrating legacy monolithic applications to cloud-native microservices architecture • Developed high-throughput data processing systems using Apache Airflow and SQL, handling large-scale ETL workflows for enterprise analytics platforms • Engineered reliability improvements including Nginx load balancing, DNS configuration, and Single Sign-On (SSO) integrations, improving system uptime and security posture for client-facing applications • Collaborated with cross-functional teams to deliver SOC2 compliant software systems, implementing security controls and audit logging across distributed services running on AWS and Azure infrastructure ### Ex-Co-Founder @ Filvelop Jan 2021 – Jan 2022 | Bengaluru, Karnataka, India Co-founded an Ed-tech startup focused on developing an integrated, single-window application portal to streamline access to academic and career resources for students, contributing to the platform’s design, functionality, and market readiness. • Drove product adoption by onboarding schools, colleges and universities across regions. • Built and led a team of 20 across tech, operations, and outreach functions. • Made strategic partnerships with relevant resource providers, software providers and onboarded advisory board. • Led core functions including team building, operational strategy, and technology development management. • Led platform design, market readiness, and go-to-market planning for the student-facing application. ### Assistant Area Director Administration @ Toastmasters International Jan 2018 – Jan 2019 | Vellore Area, India ### Member @ District 82 Toastmasters Jan 2016 – Jan 2019 | Vit university ### Vice President Membership @ District 82 Toastmasters Jan 2017 – Jan 2018 | Vellore Area, India ### Secretary @ District 82 Toastmasters Jan 2017 – Jan 2017 | Vellore Area, India One of the Executive Officers of the club.Head the administration and strategic affairs of the club ## Education ### Master's degree in Computer Science Stevens Institute of Technology ### Bachelor's degree in Computer Engineering Vellore Institute of Technology ## Contact & Social - LinkedIn: https://linkedin.com/in/kruthiwusirika - GitHub: https://github.com/kruthiwusirika - Portfolio: https://medium.com/@kruthiwusirika --- Source: https://flows.cv/saikruthi JSON Resume: https://flows.cv/saikruthi/resume.json Last updated: 2026-03-29