Designed and implemented multi-threaded, asynchronous data pipeline in Rust to replace the legacy C# implementation that processes billions of audit logs per day.
Created a spatial hardware inventory dataset and built a machine learning pipeline for failure detection, reaching an 80% true positive rate on an imbalanced dataset.
Developed an end-to-end, multi-modal data preprocessing and ML pipeline for image captioning classification. Published thesis paper, supervised by Professor Gabriel Kreiman.
Created a custom image-text dataset, generated contextual embeddings from BERT and ResNet-18, and leveraged PCA dimensionality reduction to improve efficiency. Implemented training + inference with ML classifiers (SVM, Naive Bayes, DNN) on compressed contextual embeddings, achieving 70% accuracy (competitive with SOTA) with a linear SVM compared to 40.4% with static embeddings.
Implemented post-training, attention head pruning (magnitude/gradient) for transformer pipeline. Benchmarking inference performance with ONNX Runtime on AWS instance with Docker. Wrote a PyTorch to ONNX converter and backing for FastT5 transformer pipeline, resulting in a 2x inference speed up.
Education
Harvard John A. Paulson School of Engineering and Applied Sciences