# Nilesh Shinde

> Staff Software Engineer at Aurora

Location: Fremont, California, United States
Profile: https://flows.cv/nileshshinde

I’m an ML Infrastructure & MLOps engineer with a deep foundation in network software engineering, distributed systems, and high-performance data center networking. For 14+ years, I’ve built and optimized the systems that move data at scale across NICs, switches, kernels, and GPU clusters.

My career began in core networking, and I’ve had the privilege of working at companies like NVIDIA, where I contributed to data center networking software, and HPE/Aruba, where we helped launch the OpenSwitch open-source NOS. I’ve designed and debugged everything from PTP/gPTP time-sensitive networking to RDMA fabrics, kernel-bypass transports, VLAN/QoS pipelines, switch SDK integrations (Broadcom, Marvell, Realtek), and large-scale observability systems.

Today, I apply that background to the world of Machine Learning Infrastructure, building the platforms that support large-scale training, inference, and autonomous driving workloads. 
I am excited about and getting my hands dirty with, 
GPU Initiated networking, RDMA, NCCL
Distributed training systems
Cluster observability and reliability engineering
MLOps workflows: pipelines, orchestration, experiment tracking
High-availability systems for autonomous driving compute stacks
Monitoring and debugging latency, throughput, and networking bottlenecks at scale

What excites me today is the convergence of networking + AI systems. Modern ML workloads demand microsecond-level precision, predictable communication patterns, and high-performance I/O paths, exactly the problems I’ve spent my career solving.

I’m passionate about building the next generation of ML systems that are faster, more efficient, and more reliable, whether through kernel-bypass networking, GPU-initiated communication, observability tooling, or highly optimized distributed training infrastructure.

If you’re working on ML systems, large-scale infra, or cutting-edge GPU networking, I’d love to connect.

## Work Experience
### Staff Software Engineer @ Aurora
Jan 2022 – Present | Mountain View, CA
ML Infrastructure & GPU Networking
Architecting next-generation GPU cluster networking for ML training workloads, including performance testing and validation for H100/B200 class accelerators
Designed distributed observability infrastructure (Grafana, Prometheus) for ML pipeline metrics, GPU utilization, and network telemetry across training clusters
Built batch workflow orchestration for large-scale ML training jobs with fault tolerance and automatic retry logic
Optimized NCCL configs to mitigate network bottlenecks.

Time-Sensitive Networking & Synchronization
Architected high-availability PTP/gPTP time synchronization network achieving 75% reduction in sync faults
Designed automated fault detection and mitigation systems including ARP protection, firewalls, and real-time monitoring
Implemented QoS tuning and traffic shaping to prioritize high priority traffic over background data movement

Low-Latency Distributed Systems
Led networking architecture across platform, ML infrastructure, and autonomous driving stacks
Built status and fault-reporting frameworks using C++ and protobuf for sub-millisecond latency monitoring Authored comprehensive network architecture documentation and design specifications

### Senior Software Engineer, Network lead @ Argo AI
Jan 2022 – Jan 2022 | Palo Alto, California, United States
Onboard Network Architecture for ML Inference:
Owned end-to-end networking architecture connecting 12+ sensors and GPU/ compute pods for real-time ML inference
Designed low-latency inter-process communication paths optimized for sensor fusion and perception model data flows
Implemented L2 multicast optimizations and TCAM tuning to eliminate bandwidth bottlenecks in high-throughput sensor streams

Time Synchronization for ML Workloads:
Led IEEE 802.1AS (gPTP) implementation achieving sub-microsecond synchronization across distributed sensors, enabling accurate temporal correlation for perception models
Built real-time monitoring tools for time-sync drift detection and automatic correction

### Senior Software Engineer @ NVIDIA
Jan 2020 – Jan 2022 | Santa Clara, California, United States
GPU Cluster Networking & AI Infrastructure:
Led development of In-Service Software Upgrade (ISSU) enabling zero-downtime upgrades for GPU cluster fabrics, critical for continuous AI training operations
Spearheaded RDMA over Converged Ethernet (RoCE) proof-of-concepts and deployment strategies for GPU Direct RDMA, reducing inter-GPU communication latency for distributed training
Designed and implemented kernel-bypass networking using DPDK for high-throughput, low-latency data paths in AI training clusters

Data Center Fabric Optimization:
Optimized VXLAN, EVPN, and MLAG configurations for L2/L3 GPU cluster fabrics, improving bisection bandwidth and reducing tail latency
Led QoS feature development for traffic prioritization in mixed AI training and inference workloads
Implemented SPAN/ERSPAN for network telemetry and performance debugging in production GPU clusters

Control Plane & High Availability:
Designed Smart Manager Daemon using multi-threaded ZMQ for control plane orchestration.
Implemented graceful restart protocols (BGP, OSPF, MLAG, BFD) ensuring network stability during upgrades and control plane restarts.

### Member Of Technical Staff @ Cumulus Networks
Jan 2018 – Jan 2020 | Mountain View
Data Center Networking:
Led development of high availability Hardware VXLAN Tunnel End Points (VTEP) control plane solution integrated with VMware NSX.
Published Cumulus Linux as a solution on VMware Solution Exchange Under Technology Alliance Partner program to increase awareness of the solution thus increasing customer base and revenue for the company.
Co-led design and integration of Fastboot solution which reduced the downtime and traffic loss of Cumulus Linux by 65 % and improved reboot as well as upgrade performance.
Rewrote critical daemons to improve the scalability and speed of kernel to hardware configurations which improved the performance.
Handled critical customer escalations on Broadcom and Mellanox hardware platforms. Also, maintaining hardware vendor SDK and adding patches as required to improve performance.
Co-led integration of code sanitization software and fixed critical memory leaks and corruption thus improving code quality and system reliability.
Developed SPAN-ERSPAN feature for global support teams and customers to quickly triage the reasons for critical path issues.
Added feature enhancements and fixes to data center features and protocols including but not limited to, VxLAN, EVPN, BGP, ACL, QoS, etc.

### Software Firmware Engineer III @ Aruba, a Hewlett Packard Enterprise company
Jan 2017 – Jan 2018 | Santa Clara, California
Design, develop, and implement software systems while achieving quality and delivery objectives. Work on early enablement and feature development of multi-layer switches. 

Successfully Led and completed platform independent and dependent plug-ins for ACL, and QOS.

### Software Firmware Engineer II @ Aruba, a Hewlett Packard Enterprise company
Jan 2016 – Jan 2017 | Santa Clara, California
Contributed extensively to OpenSwitch by designing and developing a Yocto based open source embedded Linux project, which later became part of Linux Foundation.

Organized meetups and brown bag meetings to educate and spread the open source technologies and raised the community awareness for OpenSwitch and Ansible which benefitted the marketing and sales teams.

Developed and integrated Ansible on OpenSwitch project for orchestration. Wrote playbooks with YAML and Jinja2 templates.

Oversaw TACACS+ authorization development and integration using PAM libraries and SSH server for local and remote users command authorization. Led the development and integration of IPv6 support for the TACACS+ feature.

Managed the complete lifecycle of sFlow feature development on from scoping, development, implementation to testing.

Delivered NTP client feature that communicates with NPT server to synchronize system time for precise logging system.

Wrote Docker files and created custom Docker containers for feature testing.

Enhanced BGP and DHCP/TFTP features and added feature tests to validate the enhancements

### Software Engineer @ Zhone Technologies
Jan 2013 – Jan 2015
Led the design, programming, testing, documentation, and implementation of software for Layer 2 and Layer 3 networking protocol projects which involve VLAN, IGMP, GPON, Fiber LAN, L2/L3 forwarding, SNMP, packet processing, xDSL, and Link aggregation.

Designed, coded, and integrated software on Linux and VxWorks platform for network access products.
Oversaw feature development to add rules for packet processing with Broadcom FPs to ensure DHCP v4/v6, broadcast ARP, and ethproxy traffic can go to higher COS queue.

Added functionality for replacing the Class of Service (COS/PCP) for single-tagged and double-tagged traffic. 

Wrote Python and Shell scripts to automate the test procedures on networking devices.

### Teaching Assistant @ San Francisco State University
Jan 2012 – Jan 2013 | San Francisco
Served as Teaching Assistant for networking concepts course, facilitating communication between and providing course support to undergraduate students. 
Proficiently taught Layer 2 and layer 3 routing protocols, network design, and optimization.
Facilitated practical sessions in Computer networking lab to explain different topologies and routing concepts.

### Special Assistant for Data and IT @ San Francisco State University
Jan 2012 – Jan 2013 | San Francisco, California
Developed PHP, SQL, PSQL and shell scripts to automate Linux based database server.

### Software Engineer @ Persistent Systems
Jan 2011 – Jan 2012 | Pune Area, India
Performed project planning, system analysis, software design and coding, testing, documentation, implementation, and research activities as necessary for software engineering projects.
Designed and developed software for Linux based mail and messaging servers and developed TCP/IP layer 2 and layer 3 protocol features.
Developed and tested custom Linux Kernel modules components and device drivers, and added feature enhancements to improve the overall user experience of the product.

### Technical Engineer @ VC International Pvt. Ltd. (CISCO)
Jan 2009 – Jan 2010
Worked on TCP/IP protocol, network infrastructure, network security and network optimization. 
Configured and optimized enterprise and large office networks on windows and Linux platform.


## Education
### Master's degree in Embedded Electrical and Computer Systems
San Francisco State University

### Bachelor's degree in Electrical, Electronics and Communications Engineering
Sinhgad college of engineering

### Diploma in Electrical, Electronics and Communications Engineering
SVCP


## Contact & Social
- LinkedIn: https://linkedin.com/in/nilesh-s-shinde
- Portfolio: https://nshinde.github.io/infra-for-ai

---
Source: https://flows.cv/nileshshinde
JSON Resume: https://flows.cv/nileshshinde/resume.json
Last updated: 2026-04-12