I’m an ML Infrastructure & MLOps engineer with a deep foundation in network software engineering, distributed systems, and high-performance data center networking. For 14+ years, I’ve built and optimized the systems that move data at scale across NICs, switches, kernels, and GPU clusters.

Experience

AuroraStaff Software Engineer

2022 — Now

Mountain View, CA

ML Infrastructure & GPU Networking

Architecting next-generation GPU cluster networking for ML training workloads, including performance testing and validation for H100/B200 class accelerators

Designed distributed observability infrastructure (Grafana, Prometheus) for ML pipeline metrics, GPU utilization, and network telemetry across training clusters

Built batch workflow orchestration for large-scale ML training jobs with fault tolerance and automatic retry logic

Optimized NCCL configs to mitigate network bottlenecks.

Time-Sensitive Networking & Synchronization

Architected high-availability PTP/gPTP time synchronization network achieving 75% reduction in sync faults

Designed automated fault detection and mitigation systems including ARP protection, firewalls, and real-time monitoring

Implemented QoS tuning and traffic shaping to prioritize high priority traffic over background data movement

Low-Latency Distributed Systems

Led networking architecture across platform, ML infrastructure, and autonomous driving stacks

Built status and fault-reporting frameworks using C++ and protobuf for sub-millisecond latency monitoring Authored comprehensive network architecture documentation and design specifications

Argo AISenior Software Engineer, Network lead

2022 — 2022

Palo Alto, California, United States

Onboard Network Architecture for ML Inference:

Owned end-to-end networking architecture connecting 12+ sensors and GPU/ compute pods for real-time ML inference

Designed low-latency inter-process communication paths optimized for sensor fusion and perception model data flows

Implemented L2 multicast optimizations and TCAM tuning to eliminate bandwidth bottlenecks in high-throughput sensor streams

Time Synchronization for ML Workloads:

Led IEEE 802.1AS (gPTP) implementation achieving sub-microsecond synchronization across distributed sensors, enabling accurate temporal correlation for perception models

Built real-time monitoring tools for time-sync drift detection and automatic correction

NVIDIASenior Software Engineer

2020 — 2022

Santa Clara, California, United States

GPU Cluster Networking & AI Infrastructure:

Led development of In-Service Software Upgrade (ISSU) enabling zero-downtime upgrades for GPU cluster fabrics, critical for continuous AI training operations

Spearheaded RDMA over Converged Ethernet (RoCE) proof-of-concepts and deployment strategies for GPU Direct RDMA, reducing inter-GPU communication latency for distributed training

Designed and implemented kernel-bypass networking using DPDK for high-throughput, low-latency data paths in AI training clusters

Data Center Fabric Optimization:

Optimized VXLAN, EVPN, and MLAG configurations for L2/L3 GPU cluster fabrics, improving bisection bandwidth and reducing tail latency

Led QoS feature development for traffic prioritization in mixed AI training and inference workloads

Implemented SPAN/ERSPAN for network telemetry and performance debugging in production GPU clusters

Control Plane & High Availability:

Designed Smart Manager Daemon using multi-threaded ZMQ for control plane orchestration.

Implemented graceful restart protocols (BGP, OSPF, MLAG, BFD) ensuring network stability during upgrades and control plane restarts.

Cumulus NetworksMember Of Technical Staff

2018 — 2020

Mountain View

Data Center Networking:

Led development of high availability Hardware VXLAN Tunnel End Points (VTEP) control plane solution integrated with VMware NSX.

Published Cumulus Linux as a solution on VMware Solution Exchange Under Technology Alliance Partner program to increase awareness of the solution thus increasing customer base and revenue for the company.

Co-led design and integration of Fastboot solution which reduced the downtime and traffic loss of Cumulus Linux by 65 % and improved reboot as well as upgrade performance.

Rewrote critical daemons to improve the scalability and speed of kernel to hardware configurations which improved the performance.

Handled critical customer escalations on Broadcom and Mellanox hardware platforms. Also, maintaining hardware vendor SDK and adding patches as required to improve performance.

Co-led integration of code sanitization software and fixed critical memory leaks and corruption thus improving code quality and system reliability.

Developed SPAN-ERSPAN feature for global support teams and customers to quickly triage the reasons for critical path issues.

Added feature enhancements and fixes to data center features and protocols including but not limited to, VxLAN, EVPN, BGP, ACL, QoS, etc.

Aruba, a Hewlett Packard Enterprise companySoftware Firmware Engineer III

2017 — 2018

Santa Clara, California

Design, develop, and implement software systems while achieving quality and delivery objectives. Work on early enablement and feature development of multi-layer switches.

Successfully Led and completed platform independent and dependent plug-ins for ACL, and QOS.

Education

San Francisco State University

Master's degree

Sinhgad college of engineering

Bachelor's degree

SVCP

Experience+6

Education

Master's degree

Bachelor's degree

Diploma

Experience