Refactored SQL to Python data pipeline to achieve speedup in response time from 30s to <1s in ground-up redesign of poorly written legacy code, resolving a major complaint among federal clients.
Led successful multi-team deployment of solutions across thousands of clients including high-stakes federally aligned clients using Oracle Cloud Infrastructure/Python/Jenkins/shell.
Simplified HealtheLife Python/node installation process for legacy patient portal from a three-week ordeal to a one-day process for new team members.
Led setup and deployment of new patient portal environment for Sweden/Ireland region on Oracle Cloud Infrastructure, organizing cross-team debugging of legacy code migration issues.
Migrated highly complex, 10000+ line lesson architecture to more modernized architecture with a team of six, using Python, MongoDB, and React/Redux, with a focus on redesigning tutor and student reservation scheduling to the new data model and making design tradeoffs.
Designed and implemented an optimized, thread-safe system for controlling group lesson availability for tutors/students to dynamically adjust supply based on demand, significantly reducing the number of unfilled and barely filled group lessons.
Designed and successfully launched group lessons, including the underlying backend Python and MongoDB lesson data structures along with the frontend React work with designing tutor signup page and the admin page changes, which has already drawn 1000+ new subscribers without much marketing..
University of California, Los Angeles: VLSI Architecture & Synthesis Lab
Accelerated HLS code running on AWS instances, generating speedups of approximately 200x compared to naive implementations of dot product and knn code. Working on accelerating U-net neural network for biomedical image segmentation on FPGAs.
Prepared material for and led weekly two hour discussions of 30 students presenting lecture, homework, exam, and lab material. Covered performance optimization for CPUs, GPUs, and FPGAs using SIMD operations, memory alignment, loop transformations, and blocking based on cache size to speed up code by a factor of up to 700x.
Redesigned large C++ workloads (over 10000 lines) intended to be used on AWS FPGAs for Intel DevCloud functionality, transforming them into OpenCL implementations, targeted towards processing terabyte-sized datasets. Integrated near-data computation technology into C++ prototype code written for previous task.