Boston, Massachusetts, United States
* Preprocessed 600+ clinical speech samples in Python using Stanza and spaCy, segmenting sentences and extracting 103 linguistic biomarkers feed into ML models and reduced clinician diagnostic time by 80%.
* Automated Python environment setup with a shell script to install libraries and created a Dockerfile for reproducible deployment, reducing onboarding time for team members from 2 hours to 10 minutes.
* Leveraged Amazon Transcribe with AWS S3 buckets and Lambda functions to transcribe 600+ clinical speech samples into text, automating speech-to-text conversion to eliminate the process of manual transcription.
* Implemented a semantic coherence error detection algorithm using cosine similarity of word embeddings to identify discourse-level impairments in patient speech, achieving 80% classification accuracy across coherence subtypes.
* Developed a novel unsupervised clustering algorithm for matching spoken utterances from aphasic and dementia patients to narrative concepts (e.g., Cinderella), achieving 83.7% accuracy on 10 patient transcripts, surpassing K-means by 10.7%, reducing clinical analysis time by 30%, and enabling personalized treatment for 50+ patients.