LLMs @ Otter.ai, Machine Learning Engineer and Founder
My primary focus is machine learning and applying it to different domains. I strive to build tools that augment human productivity, creativity, and capability. Feel free to reach out to me at dshah3@outlook.com.
Co-founded an AI company that uses ML to identify and fix vulnerabilities in smart contracts, raising $1.7M from investors like Alchemy and Symbolic Capital with advisors/angels from Meta, Apple, and Ledger
•
Trained a lightweight graph neural network (GNN) with NetworkX and PyTorch Geometric to classify cross-contract and cross-function reentrancy from control flow graphs (CFG)
•
Spearheaded an effort to use SFT and RLHF to finetune and steer LLMs (Llama2-70B and Mistral-Medium) into identifying vulnerable code that evaded all traditional static analyzers like Slither
•
Augmented a fuzzer with a small Transformer fitted with a custom vocabulary and tokenizer to filter transactions based on the likelihood that they broke predefined invariants created in Echidna
•
Established a robust hosting and infrastructure environment using AWS SageMaker, Docker, and Modal for efficient model training, testing, and deployment, streamlined with Terraform for seamless automation and scalability
Collaborated with postdocs to create a shallow U-Net in TensorFlow to segment neurons in 2-photon calcium imaging videos called SUNS
•
Created an active learning pipeline that selectively labels neurons based on their frequency and SNR in successive SUNS runs, which reduces the need for labeled neurons to reach SOTA by 50x
•
Developed sets of Bash scripts for automating jobs across GPU clusters and monitored active learning jobs using custom Python scripts to probe aggregated results
Implemented an approach in PyTorch to estimate training data influence by tracing gradient descent with Weights and Biases logs and Torch Studio
•
Extended approach to object detection for detection of false negatives in human-labeled datasets; treated mislabeled instances as its own class based on mixed confidences from its original labeled classes
•
Detected over 50% of false negatives in self-driving datasets and reported them to human labelers as a form of feedback; deployed feedback model on AWS SageMaker