I’m a software engineer with a deep focus on CUDA development and cuda kernel optimization for deep learning inference, ML infra performance optimization, ML backbone, high performance computing, embedded platforms, and GPU-accelerated pipelines.
Re-architected a legacy C++/Qt application into a scalable MVVM-based design with a full QML frontend, enabling advanced UI logic, modularity, and long-term extensibility
•
Designed and implemented real-time telemetry and image publishing interfaces for seamless integration with a Python-based scientific alignment and control system
•
Developed high-performance centroiding and filtering algorithms, including gamma correction and iteratively-weighted center of mass (CoM) refinement for precise spot tracking
•
Delivered tools such as real-time histogramming, CSV-based logging, and robust configuration preset management
•
Enabled long-term automation by decoupling GUI from backend logic and preparing the pipeline for remote command/control
•
Provided production-grade code and documentation to support critical alignment milestones and prepare the software stack for future multi-camera scalability
Architected and developed a modular perception Software Development Kit (SDK) for autonomous systems deployed on Jetson Orin, integrating 8-camera capture (V4L2) with TensorRT inference pipelines for real-time computer vision (depth, disparity, polarization).
•
Wrote and optimized CUDA kernels for depth processing, spatial filtering (with shared memory), and tensor layout conversion as part of a real-time inference pipeline on Jetson Orin; combined with stream-level parallelism and memory overlap to reduce preprocessing latency by 47% and boost throughput by 85%.
•
Improved backend throughput and stability by overlapping data capture, preprocessing, and TensorRT inference using CUDA APIs.
•
Built a real-time point cloud visualizer using Qt and OpenGL, transforming a minimal graphics example into a fully live 3D viewer synchronized with RGB and depth maps.
•
Implemented an interactive pixel-level depth prober with <1-frame latency, allowing users to hover over RGB or depth views to query accurate depth in real time.
•
Delivered robust, real-time solutions under high-reliability constraints for clients in various industries.
In this role, I am responsible for the design and development of control software for distributed systems, focusing on the Automated Alignment System for the interferometer. I assess needs and formulate the system requirements with the system designer and principal software engineering. My work spans the data collection methods (temperature sensors, cameras, quad cells), the development of efficient data processing algorithms, and the software for system control using a variety of communication protocols.
Some notable achievements include:
•
I conceptualized and created a proof of concept for the integration of Qt/C++ applications with the central Interferometer Supervisory System.
•
I developed an application for a back-end stability camera, interfaced via the CameraLink technology over the network, created in C++ and Qt framework, with a GUI using OpenCV and the matrix library Eigen to calibrate the beam relay system. This tool is critical in addressing the stability of the system. As part of this same project, I also trained and implemented two deep convolutional neural networks, namely, DnCNN and U-Net in C++ using PyTorch to assist with wavefront reconstruction and measurement within the software, allowing for more accurate measurements in presence of downscaling and atmospheric turbulence.
•
I delivered a Java Application to operate the LabJack units controlling the power input for laser and filament light injection systems.