# Nilesh Shinde > Staff Software Engineer at Aurora Location: Fremont, California, United States Profile: https://flows.cv/nileshshinde I’m an ML Infrastructure & MLOps engineer with a deep foundation in network software engineering, distributed systems, and high-performance data center networking. For 14+ years, I’ve built and optimized the systems that move data at scale across NICs, switches, kernels, and GPU clusters. My career began in core networking, and I’ve had the privilege of working at companies like NVIDIA, where I contributed to data center networking software, and HPE/Aruba, where we helped launch the OpenSwitch open-source NOS. I’ve designed and debugged everything from PTP/gPTP time-sensitive networking to RDMA fabrics, kernel-bypass transports, VLAN/QoS pipelines, switch SDK integrations (Broadcom, Marvell, Realtek), and large-scale observability systems. Today, I apply that background to the world of Machine Learning Infrastructure, building the platforms that support large-scale training, inference, and autonomous driving workloads. I am excited about and getting my hands dirty with, GPU Initiated networking, RDMA, NCCL Distributed training systems Cluster observability and reliability engineering MLOps workflows: pipelines, orchestration, experiment tracking High-availability systems for autonomous driving compute stacks Monitoring and debugging latency, throughput, and networking bottlenecks at scale What excites me today is the convergence of networking + AI systems. Modern ML workloads demand microsecond-level precision, predictable communication patterns, and high-performance I/O paths, exactly the problems I’ve spent my career solving. I’m passionate about building the next generation of ML systems that are faster, more efficient, and more reliable, whether through kernel-bypass networking, GPU-initiated communication, observability tooling, or highly optimized distributed training infrastructure. If you’re working on ML systems, large-scale infra, or cutting-edge GPU networking, I’d love to connect. ## Work Experience ### Staff Software Engineer @ Aurora Jan 2022 – Present | Mountain View, CA ML Infrastructure & GPU Networking Architecting next-generation GPU cluster networking for ML training workloads, including performance testing and validation for H100/B200 class accelerators Designed distributed observability infrastructure (Grafana, Prometheus) for ML pipeline metrics, GPU utilization, and network telemetry across training clusters Built batch workflow orchestration for large-scale ML training jobs with fault tolerance and automatic retry logic Optimized NCCL configs to mitigate network bottlenecks. Time-Sensitive Networking & Synchronization Architected high-availability PTP/gPTP time synchronization network achieving 75% reduction in sync faults Designed automated fault detection and mitigation systems including ARP protection, firewalls, and real-time monitoring Implemented QoS tuning and traffic shaping to prioritize high priority traffic over background data movement Low-Latency Distributed Systems Led networking architecture across platform, ML infrastructure, and autonomous driving stacks Built status and fault-reporting frameworks using C++ and protobuf for sub-millisecond latency monitoring Authored comprehensive network architecture documentation and design specifications ### Senior Software Engineer, Network lead @ Argo AI Jan 2022 – Jan 2022 | Palo Alto, California, United States Onboard Network Architecture for ML Inference: Owned end-to-end networking architecture connecting 12+ sensors and GPU/ compute pods for real-time ML inference Designed low-latency inter-process communication paths optimized for sensor fusion and perception model data flows Implemented L2 multicast optimizations and TCAM tuning to eliminate bandwidth bottlenecks in high-throughput sensor streams Time Synchronization for ML Workloads: Led IEEE 802.1AS (gPTP) implementation achieving sub-microsecond synchronization across distributed sensors, enabling accurate temporal correlation for perception models Built real-time monitoring tools for time-sync drift detection and automatic correction ### Senior Software Engineer @ NVIDIA Jan 2020 – Jan 2022 | Santa Clara, California, United States GPU Cluster Networking & AI Infrastructure: Led development of In-Service Software Upgrade (ISSU) enabling zero-downtime upgrades for GPU cluster fabrics, critical for continuous AI training operations Spearheaded RDMA over Converged Ethernet (RoCE) proof-of-concepts and deployment strategies for GPU Direct RDMA, reducing inter-GPU communication latency for distributed training Designed and implemented kernel-bypass networking using DPDK for high-throughput, low-latency data paths in AI training clusters Data Center Fabric Optimization: Optimized VXLAN, EVPN, and MLAG configurations for L2/L3 GPU cluster fabrics, improving bisection bandwidth and reducing tail latency Led QoS feature development for traffic prioritization in mixed AI training and inference workloads Implemented SPAN/ERSPAN for network telemetry and performance debugging in production GPU clusters Control Plane & High Availability: Designed Smart Manager Daemon using multi-threaded ZMQ for control plane orchestration. Implemented graceful restart protocols (BGP, OSPF, MLAG, BFD) ensuring network stability during upgrades and control plane restarts. ### Member Of Technical Staff @ Cumulus Networks Jan 2018 – Jan 2020 | Mountain View Data Center Networking: Led development of high availability Hardware VXLAN Tunnel End Points (VTEP) control plane solution integrated with VMware NSX. Published Cumulus Linux as a solution on VMware Solution Exchange Under Technology Alliance Partner program to increase awareness of the solution thus increasing customer base and revenue for the company. Co-led design and integration of Fastboot solution which reduced the downtime and traffic loss of Cumulus Linux by 65 % and improved reboot as well as upgrade performance. Rewrote critical daemons to improve the scalability and speed of kernel to hardware configurations which improved the performance. Handled critical customer escalations on Broadcom and Mellanox hardware platforms. Also, maintaining hardware vendor SDK and adding patches as required to improve performance. Co-led integration of code sanitization software and fixed critical memory leaks and corruption thus improving code quality and system reliability. Developed SPAN-ERSPAN feature for global support teams and customers to quickly triage the reasons for critical path issues. Added feature enhancements and fixes to data center features and protocols including but not limited to, VxLAN, EVPN, BGP, ACL, QoS, etc. ### Software Firmware Engineer III @ Aruba, a Hewlett Packard Enterprise company Jan 2017 – Jan 2018 | Santa Clara, California Design, develop, and implement software systems while achieving quality and delivery objectives. Work on early enablement and feature development of multi-layer switches. Successfully Led and completed platform independent and dependent plug-ins for ACL, and QOS. ### Software Firmware Engineer II @ Aruba, a Hewlett Packard Enterprise company Jan 2016 – Jan 2017 | Santa Clara, California Contributed extensively to OpenSwitch by designing and developing a Yocto based open source embedded Linux project, which later became part of Linux Foundation. Organized meetups and brown bag meetings to educate and spread the open source technologies and raised the community awareness for OpenSwitch and Ansible which benefitted the marketing and sales teams. Developed and integrated Ansible on OpenSwitch project for orchestration. Wrote playbooks with YAML and Jinja2 templates. Oversaw TACACS+ authorization development and integration using PAM libraries and SSH server for local and remote users command authorization. Led the development and integration of IPv6 support for the TACACS+ feature. Managed the complete lifecycle of sFlow feature development on from scoping, development, implementation to testing. Delivered NTP client feature that communicates with NPT server to synchronize system time for precise logging system. Wrote Docker files and created custom Docker containers for feature testing. Enhanced BGP and DHCP/TFTP features and added feature tests to validate the enhancements ### Software Engineer @ Zhone Technologies Jan 2013 – Jan 2015 Led the design, programming, testing, documentation, and implementation of software for Layer 2 and Layer 3 networking protocol projects which involve VLAN, IGMP, GPON, Fiber LAN, L2/L3 forwarding, SNMP, packet processing, xDSL, and Link aggregation. Designed, coded, and integrated software on Linux and VxWorks platform for network access products. Oversaw feature development to add rules for packet processing with Broadcom FPs to ensure DHCP v4/v6, broadcast ARP, and ethproxy traffic can go to higher COS queue. Added functionality for replacing the Class of Service (COS/PCP) for single-tagged and double-tagged traffic. Wrote Python and Shell scripts to automate the test procedures on networking devices. ### Teaching Assistant @ San Francisco State University Jan 2012 – Jan 2013 | San Francisco Served as Teaching Assistant for networking concepts course, facilitating communication between and providing course support to undergraduate students. Proficiently taught Layer 2 and layer 3 routing protocols, network design, and optimization. Facilitated practical sessions in Computer networking lab to explain different topologies and routing concepts. ### Special Assistant for Data and IT @ San Francisco State University Jan 2012 – Jan 2013 | San Francisco, California Developed PHP, SQL, PSQL and shell scripts to automate Linux based database server. ### Software Engineer @ Persistent Systems Jan 2011 – Jan 2012 | Pune Area, India Performed project planning, system analysis, software design and coding, testing, documentation, implementation, and research activities as necessary for software engineering projects. Designed and developed software for Linux based mail and messaging servers and developed TCP/IP layer 2 and layer 3 protocol features. Developed and tested custom Linux Kernel modules components and device drivers, and added feature enhancements to improve the overall user experience of the product. ### Technical Engineer @ VC International Pvt. Ltd. (CISCO) Jan 2009 – Jan 2010 Worked on TCP/IP protocol, network infrastructure, network security and network optimization. Configured and optimized enterprise and large office networks on windows and Linux platform. ## Education ### Master's degree in Embedded Electrical and Computer Systems San Francisco State University ### Bachelor's degree in Electrical, Electronics and Communications Engineering Sinhgad college of engineering ### Diploma in Electrical, Electronics and Communications Engineering SVCP ## Contact & Social - LinkedIn: https://linkedin.com/in/nilesh-s-shinde - Portfolio: https://nshinde.github.io/infra-for-ai --- Source: https://flows.cv/nileshshinde JSON Resume: https://flows.cv/nileshshinde/resume.json Last updated: 2026-04-12