# Vincent Su > Staff software engineer at VMware/Broadcom/Omnissa Location: San Jose, California, United States Profile: https://flows.cv/vincentsu ## Work Experience ### Staff software Engineer/ @ VMware Jan 2022 – Present | United States AI project: Senior Staff Software Engineer – VMware/Broadcom/Omnissa (2022–Present) - Designed CUDA/C++ pipelines improving throughput by 30%+ across large datasets - Architected RDMA transport layers (RoCEv2, InfiniBand) for ultra-low latency systems - Built distributed ML pipelines (Spark/Flink) handling millions of requests/sec - Led GPU performance profiling and optimization across heterogeneous environments - Implemented observability frameworks for distributed tracing and debugging - Work with parallel programming and at least one communication runtime (MPI, NCCL, UCX) - Work with containers, cloud provisioning and scheduling tools (Kubernetes, Docker) - experience with Deep Learning Frameworks such PyTorch, TensorFlow work with CUDA programming and/or GPUs, Machine Learning concepts -Worked with LLM APIs such as OpenAI GPT models, and I’ve also used open-source LLMs like Llama 2/Llama 3 through tools such as Hugging Face. -Architected and optimized RDMA transport protocols (RoCEv2, InfiniBand) to enable ultra-low latency and high-throughput data movement for large-scale AI workload -Led efforts to profile and optimize inference workloads across NVIDIA and AMD GPUs using CUDA, ROCm, and Triton, identifying kernel-level inefficiencies -Architected large-scale real-time bidding platform, processing millions of bidding requests per second under strict sub-100ms latency SLAs. -Designed and optimized distributed ML data pipelines using Apache Flink and Spark for online learning and real-time inference. -Built cloud-native inference pipelines on Kubernetes and AWS, optimizing GPU utilization, latency, and throughput for large-scale model serving. -Implemented observability and reliability frameworks like OpenTelemetry to monitor performance. -Applied systems and GPU performance engineering expertise from VMware Cloud -Worked on open source communities to enhance libraries like CCCL, RAPIDS, UCX -MDM Server, Mobile Android, iOS, and Web on Cloud End User Computing ### Software Architect, Agora Inc. @ Agora Jan 2020 – Jan 2022 | United States MobileiOS, Android, Web, and Gaming (Unity, Cocos) -Working with team, designing sw architecture blocks, and providing end-to-end software solution -Work directly with the Business Development team as the key technical liaison for video conference & streaming product - Act as an in-house mobile software architecture for conference & live event streaming SDKs on Audio, Video, Streaming to CDN(YouTube, FB), and RTMP/inject livestream to conference/room among iOS, Android, Web, and gaming platforms. - Coding dev development for mobile and gaming applications with XCode/Android Studio/Swift /Kotlin/Unity, Objective-C, Java, C++ - Work on WebRTC with Opus Audio codec & VP8, H264 for Video codec with front-end web development experience (JavaScript / HTML5 / CSS, react.js, vue.js, angular) ### Principal Software Engineer @ Goke Micoelectronics Co., Ltd. Jan 2018 – Jan 2020 | United States -Linux, Mobile, Video Streaming/Transcoding/GPU, -AI/ML/Jetson Nano, GPU/CUDA, Ceph storage - Leading an inhouse engineer team and working with counterpart team overseas - Work on NVR-based software on Jetson Nano platform for media streaming (Gstreamer/FFmpeg) from distributed sources. (ONVIF multicast discovery, RTSP pause, replay, transcoding operations, and motion detection: TensorFlow/OpenCV) - NVR video streaming on Nvidia HAL/H264/H265 drivers (GPU Hardware Acceleration) - NVR AI-based project for Deep and Machine Learnings on Nvidia Jetson Nano(CUDA) - Cloud programming and implementing distributed file system on top of Ceph/CentOS/Ubuntu with storage services for object, file, and block devices on clouds - Programming API on Ceph Object Gateway through OpenStack Swift API - Programming RDMA/UCX/Accelio (Remote Direct Memory Access) -Mobile App hosted Personal AI Hybrid Storage System between cloud and private wares/NAS with tools such as iOS/SWIFT & Android/JAVA/Kotlin mobile computing, Photo management, video streaming (HLS), STUN, and UPnP ### Senior Technical Leader @ Cisco Systems Jan 2012 – Jan 2016 WebEx Mobile high performance client computing: -Integrate third-Party Voice engine into WebEx Hybrid media Gateway (VOIP, Legacy VOIP, PSTN) for mobile notably and desktop clients - Audio/Video on Apple/iOS/XCode, Google Android phone (HTC/EVO, Samsung/Galaxy 6, Motorola/Droid), CISCO Cius, RIM Blackberry, and Apple Watch, iPhone/iPad with mobile VOIP development including SIP, Multicast, RTP, and RTCP native stack supports and Mobicents Sip Servlet/Tomcat to act as B2BUA -Strong background in media technology, with a thorough understanding of RTP/RTCP/SIP/RTSP, etc standards and implementations. -Fluency in Objective C, Swift, and iOS/XCode development methodologies & Java, Kotlin Android development experience -Experience with GUI front end & Media control and processing with different audio codecs along with memory profiling and optimization ### Technical Leader @ Cisco Systems Jan 2005 – Jan 2016 -SIP/VOIP/RTP/RTCP/CUCM: Work on ENUM server (Galileo). It is a Linux-based address lookup server that integrates with ENUM (E.164 Number Mapping) data management to provide real-time Audio/Video call routing control between corporate users. Each corporate site consists of SBC (Session Boarder controller) and Firewalls. -Saas/PaaS: WebEx web conference: User can utilize video/audio services for ad-hoc meetings. All the meeting data can be archived at the cloud. The data service involves in virtualization and distributed computing. Client/Server model computing: Application communication model sits on the cloud. Hybrid solutions for access applications are also provided. – in the cloud, on-premise, or a combination of both. Linux servers are located within the cloud while Desktop thick client and mobile client computing via Wi-Fi or provider networks. -Work on CISCO VOIP SIP Client-Server (CUPC) interacting with CISCO Presence server, Call Control Manager(CUCM), and Meeting Place. CUCM: Call control through CTI in phone associated mode while through SIP messages in softphone mode for voice, video, and conference calls Presence Server: SIP subscribe, publish, and Info. Support Rpid XML format to interact with 3rd party Presence server, collect buddy list real-time status, overwrite your own status, and implement IM tools to communicate with 3rd party IMs. Security: TLS from SIP end point ->Proxy verification with TFTP downloading the CTL file containing server cert (PEM /DIR format) -TLS from Proxy -> SIP endpoint verification with MD5 digest authentication challenge including nounce and cnounce between RFC 2617 implementation, Resip SIP stack, and OpenSSL ## Education ### PHD Student in Computer Science UCLA ### Master's degree in Computer Engineering University of Southern California ## Contact & Social - LinkedIn: https://linkedin.com/in/vincentssu --- Source: https://flows.cv/vincentsu JSON Resume: https://flows.cv/vincentsu/resume.json Last updated: 2026-04-12