Experience
2022 — Now
2022 — Now
United States
AI project:
Senior Staff Software Engineer – VMware/Broadcom/Omnissa (2022–Present)
• Designed CUDA/C++ pipelines improving throughput by 30%+ across large datasets
• Architected RDMA transport layers (RoCEv2, InfiniBand) for ultra-low latency systems
• Built distributed ML pipelines (Spark/Flink) handling millions of requests/sec
• Led GPU performance profiling and optimization across heterogeneous environments
• Implemented observability frameworks for distributed tracing and debugging
• Work with parallel programming and at least one communication runtime (MPI, NCCL, UCX)
• Work with containers, cloud provisioning and scheduling tools (Kubernetes, Docker)
• experience with Deep Learning Frameworks such PyTorch, TensorFlow
work with CUDA programming and/or GPUs, Machine Learning concepts
• Worked with LLM APIs such as OpenAI GPT models, and I’ve also used
open-source LLMs like Llama 2/Llama 3 through tools such as Hugging Face.
• Architected and optimized RDMA transport protocols (RoCEv2, InfiniBand) to enable
ultra-low latency and high-throughput data movement for large-scale AI workload
• Led efforts to profile and optimize inference workloads across NVIDIA and AMD
GPUs using CUDA, ROCm, and Triton, identifying kernel-level inefficiencies -Architected
large-scale real-time bidding platform, processing millions of bidding
requests per second under strict sub-100ms latency SLAs.
• Designed and optimized distributed ML data pipelines using Apache Flink and
Spark for online learning and real-time inference.
• Built cloud-native inference pipelines on Kubernetes and AWS, optimizing GPU utilization,
latency, and throughput for large-scale model serving.
• Implemented observability and reliability frameworks like OpenTelemetry to monitor performance.
• Applied systems and GPU performance engineering expertise from VMware Cloud
• Worked on open source communities to enhance libraries like CCCL, RAPIDS, UCX
• MDM Server, Mobile Android, iOS, and Web on Cloud End User Computing
2020 — 2022
2020 — 2022
United States
MobileiOS, Android, Web, and Gaming (Unity, Cocos)
• Working with team, designing sw
architecture blocks, and providing end-to-end software solution
• Work directly with the Business Development team as the key technical liaison for
video conference & streaming product
• Act as an in-house mobile software architecture for conference & live event
streaming SDKs on Audio, Video, Streaming to CDN(YouTube, FB), and
RTMP/inject livestream to conference/room among iOS, Android, Web, and
gaming platforms. - Coding dev development for mobile and gaming applications
with XCode/Android Studio/Swift /Kotlin/Unity, Objective-C, Java, C++ - Work on
WebRTC with Opus Audio codec & VP8, H264 for Video codec with front-end web
development experience (JavaScript / HTML5 / CSS, react.js, vue.js, angular)
2018 — 2020
United States
• Linux, Mobile, Video Streaming/Transcoding/GPU, -AI/ML/Jetson Nano,
GPU/CUDA, Ceph storage
• Leading an inhouse engineer team and working with counterpart team overseas
• Work on NVR-based software on Jetson Nano platform for media streaming
(Gstreamer/FFmpeg) from distributed sources. (ONVIF multicast discovery, RTSP
pause, replay, transcoding operations, and motion detection:
TensorFlow/OpenCV)
• NVR video streaming on Nvidia HAL/H264/H265 drivers (GPU Hardware
Acceleration) - NVR AI-based project for Deep and Machine Learnings on Nvidia
Jetson Nano(CUDA) - Cloud programming and implementing distributed file
system on top of Ceph/CentOS/Ubuntu with storage services for object, file, and
block devices on clouds - Programming API on Ceph Object Gateway through
OpenStack Swift API - Programming RDMA/UCX/Accelio (Remote Direct Memory
Access)
• Mobile App hosted Personal AI Hybrid Storage System between cloud and
private wares/NAS with tools such as iOS/SWIFT & Android/JAVA/Kotlin mobile
computing, Photo management, video streaming (HLS), STUN, and UPnP
2012 — 2016
2012 — 2016
WebEx Mobile high performance client computing:
• Integrate third-Party Voice engine into WebEx Hybrid media Gateway (VOIP, Legacy VOIP, PSTN) for mobile notably and desktop clients
• Audio/Video on Apple/iOS/XCode, Google Android phone (HTC/EVO, Samsung/Galaxy 6, Motorola/Droid), CISCO Cius, RIM Blackberry, and Apple Watch, iPhone/iPad with mobile VOIP development including SIP, Multicast, RTP, and RTCP native stack supports and Mobicents Sip Servlet/Tomcat to act as B2BUA
• Strong background in media technology, with a thorough understanding of RTP/RTCP/SIP/RTSP, etc standards and implementations.
• Fluency in Objective C, Swift, and iOS/XCode development methodologies & Java, Kotlin Android development experience
• Experience with GUI front end & Media control and processing with different audio codecs along with memory profiling and optimization
2005 — 2016
2005 — 2016
• SIP/VOIP/RTP/RTCP/CUCM: Work on ENUM server (Galileo). It is a Linux-based address lookup server that integrates with ENUM (E.164 Number Mapping) data management to provide real-time Audio/Video call routing control between corporate users. Each corporate site consists of SBC (Session Boarder controller) and Firewalls.
• Saas/PaaS: WebEx web conference: User can utilize video/audio services for ad-hoc meetings. All the meeting data can be archived at the cloud. The data service involves in virtualization and distributed computing. Client/Server model computing: Application communication model sits on the cloud. Hybrid solutions for access applications are also provided. – in the cloud, on-premise, or a combination of both. Linux servers are located within the cloud while Desktop thick client and mobile client computing via Wi-Fi or provider networks.
• Work on CISCO VOIP SIP Client-Server (CUPC) interacting with CISCO Presence server, Call Control Manager(CUCM), and Meeting Place. CUCM: Call control through CTI in phone associated mode while through SIP messages in softphone mode for voice, video, and conference calls Presence Server: SIP subscribe, publish, and Info. Support Rpid XML format to interact with 3rd party Presence server, collect buddy list real-time status, overwrite your own status, and implement IM tools to communicate with 3rd party IMs. Security: TLS from SIP end point ->Proxy verification with TFTP downloading the CTL file containing server cert (PEM /DIR format)
• TLS from Proxy -> SIP endpoint verification with MD5 digest authentication challenge including nounce and cnounce between RFC 2617 implementation, Resip SIP stack, and OpenSSL
Education
UCLA
PHD Student
University of Southern California