AI & ML Performance Engineer with expertise in GPU kernel optimization, inference acceleration, and multi-GPU programming, skilled in C++, CUDA, and Python. Experienced in deploying and optimizing LLMs on AWS with vLLM, building high-performance kernels , and reducing latency on NVIDIA and AMD architectures.

Experience

Together AIML SWE - Inference Platform

2025 — Now

San Francisco, California, United States

IpserLabAI Engineer

2025 — 2025

Agot (Acquired by HME)Computer Vision and Machine Learning Intern

2023 — 2023

Pittsburgh, Pennsylvania, United States

● Optimized object detection and segmentation models using DeepStream’s TensorRT integration, for a

40% increase in throughput via layer fusion, kernel auto-tuning, and memory bandwidth optimizations.

● Leveraged NVIDIA’s Deep Learning Accelerator (DLA) cores on Orin to offload compute-intensive

workloads, balancing GPU and DLA execution for maximum throughput and power efficiency on edge devices.

● Engineered low-latency video pipelines by integrating RTSP streams with NVIDIA DeepStream SDK,

which improved end-to-end inference latency by 35%.

● Optimized segmentation models using NVIDIA TAO and DeepStream, achieving a 20% improvement in IoU and deploying efficiently on NVIDIA Xavier and Orin.

● Led the development and launch of an innovative food waste management solution, leveraging a novel ML algorithm for data forecasting and vision-based analysis, resulting in a 50% reduction in waste.

● Integrated visual language models (GPT-4V, LLaVa) into computer vision pipelines, enabling multimodal scene understanding and improving complex scene interpretation accuracy by 30%.

● Developed and deployed Transformer-UNet-based segmentation and detection models on AWS SageMaker, orchestrating deployments on a Kubernetes cluster with Argo CD and Docker for seamless automation.

Education

Northeastern University

Master's degree

BMS Institute of Technology and Management

Experience

Education

Master's degree

Bachelor of Engineering - BE