# SaiKruthi Wusirika

> Senior Software Engineer – AI Infra & Distributed Systems | LLM Serving, GPU Orchestration, Kubernetes, Observability, Open Source Contributer

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/saikruthi

I build the infrastructure that powers frontier AI — from large-scale LLM serving on multi-node GPU clusters to cloud-native Kubernetes platforms handling mission-critical production workloads.

As a Senior AI Infrastructure Engineer at Multiscale AI, I design and operate the systems where uptime, throughput, and reliability are non-negotiable. I'm especially excited about building robust, large-scale infrastructure for frontier models, and I'm actively open to roles at AI labs working on safety-critical and high-impact systems.

Selected Impact:
• 4x LLM serving throughput and 60% lower P99 latency on H100 clusters via disaggregated prefill/decode (vLLM, KubeRay, Go)
• Reduced idle GPU spend by ~$200K/month with fair-share, auto-scaling GPU scheduling (Kueue, KEDA, fractional GPU partitioning)
• Cut deployment time from 2 days to 15 minutes with a multi-tenant Kubernetes IDP serving multiple engineering teams
• Built data platforms querying billions of rows with sub-second latency, reducing data discovery time by 90%
• Standardized end-to-end observability for AI agents using OpenTelemetry, Prometheus, and high-cardinality metrics

Core: Distributed systems · Large-scale LLM serving · GPU clusters · Kubernetes · Observability · ML infrastructure

Tech: Go · Python · Rust · C++ · Kubernetes (CKA) · vLLM · Triton · KubeRay · NVIDIA GPU Operator · Kueue · KEDA · ArgoCD · Terraform · Prometheus · Grafana · OpenTelemetry · Spark · Kafka · MLflow · AWS · GCP

## Work Experience
### Senior Software Engineer @ Multiscale AI
Jan 2024 – Present | Remote
Lead engineer for large-scale LLM serving and GPU infrastructure powering tier-1 customers  across multi-cloud Kubernetes platforms.

• Designed and operated large-scale LLM serving pipelines using disaggregated prefill/decode on multi-node H100 clusters (vLLM, Go, KubeRay), improving P99 latency by 60% and 4x-ing throughput for mission-critical production workloads
• Built fair-share GPU scheduling across shared clusters using Kueue, KEDA, and fractional GPU partitioning, cutting idle GPU cost by ~$200K/month while maintaining 99.9% job success and cluster availability
• Designed and deployed multi-tenant, cloud-native Kubernetes platform (AppHub IDP) with GitOps (ArgoCD, Tekton), service mesh, and disaster recovery; cut deployment time from 2 days to 15 minutes across multiple engineering teams
• Standardized end-to-end observability for AI agents using OpenTelemetry, Prometheus, and high-cardinality metrics, enabling sub-second root cause analysis of distributed failures and improving incident MTTR
• Built MLflow-backed experiment and model infrastructure on Kubernetes (PostgreSQL, MinIO) as a shared platform, automating deployments via Go CLIs and Terraform across AWS, GCP, and Azure
• Contributed open-source patches to NVIDIA Dynamo, improving distributed inference throughput and memory efficiency on A100/H100 GPU clusters
• Led ISO/IEC 27001 and SOC2 Type II compliance programs, architecting security controls across cloud infrastructure

### Software Engineer Intern @ Multiscale AI
Jan 2023 – Jan 2024 | Seattle, Washington, United States
• Developed and benchmarked distributed inference systems using Triton Inference Server, Kubeflow, and Kubernetes, achieving significant throughput improvements for LLM serving workloads in HPC environments

• Built RAG (Retrieval-Augmented Generation) pipelines using LangChain and LLMs, enabling semantic search over enterprise knowledge bases with sub-2-second query latency

• Deployed and managed multi-node GPU clusters using Slurm workload manager and HPCC systems, running MPI-based distributed training jobs with NVIDIA cuDNN and CUDA optimization

• Implemented end-to-end MLOps pipelines using Apache Airflow and MLflow, automating model training, validation, and deployment workflows on AWS and GCP

• Architected Nginx-based reverse proxy and service mesh configurations with Kubernetes, improving API gateway performance and enabling zero-downtime deployments

• Contributed to internal developer platform (IDP) tooling using Helm Charts and YAML-based Kubernetes operators, reducing onboarding time for new engineering teams

### Software Engineer @ Wipro Limited
Jan 2019 – Jan 2022 | Bengaluru, Karnataka, India
• Designed and maintained distributed microservices architecture using Java, Python, and Scala on Kubernetes, supporting enterprise clients across banking and financial services sectors

• Built and optimized CI/CD pipelines using Jenkins, Docker, and Terraform on AWS and Azure, reducing deployment time by automating infrastructure provisioning and release management

• Implemented container orchestration solutions using Kubernetes and Docker Swarm, migrating legacy monolithic applications to cloud-native microservices architecture

• Developed high-throughput data processing systems using Apache Airflow and SQL, handling large-scale ETL workflows for enterprise analytics platforms

• Engineered reliability improvements including Nginx load balancing, DNS configuration, and Single Sign-On (SSO) integrations, improving system uptime and security posture for client-facing applications

• Collaborated with cross-functional teams to deliver SOC2 compliant software systems, implementing security controls and audit logging across distributed services running on AWS and Azure infrastructure

### Ex-Co-Founder @ Filvelop
Jan 2021 – Jan 2022 | Bengaluru, Karnataka, India
Co-founded an Ed-tech startup focused on developing an integrated, single-window application portal to streamline access to academic and career resources for students, contributing to the platform’s design, functionality, and market readiness.
• Drove product adoption by onboarding schools, colleges and universities across regions.
• Built and led a team of 20 across tech, operations, and outreach functions.
• Made strategic partnerships with relevant resource providers, software providers and onboarded advisory board.
• Led core functions including team building, operational strategy, and technology development management.
• Led platform design, market readiness, and go-to-market planning for the student-facing application.

### Assistant Area Director Administration @ Toastmasters International
Jan 2018 – Jan 2019 | Vellore Area, India

### Member @ District 82 Toastmasters
Jan 2016 – Jan 2019 | Vit university

### Vice President Membership @ District 82 Toastmasters
Jan 2017 – Jan 2018 | Vellore Area, India

### Secretary @ District 82 Toastmasters
Jan 2017 – Jan 2017 | Vellore Area, India
One of the Executive Officers of the club.Head the administration and strategic affairs of the club


## Education
### Master's degree in Computer Science
Stevens Institute of Technology

### Bachelor's degree in Computer Engineering
Vellore Institute of Technology


## Contact & Social
- LinkedIn: https://linkedin.com/in/kruthiwusirika
- GitHub: https://github.com/kruthiwusirika
- Portfolio: https://medium.com/@kruthiwusirika

---
Source: https://flows.cv/saikruthi
JSON Resume: https://flows.cv/saikruthi/resume.json
Last updated: 2026-03-29