Software developer with expertise in web development, machine learning (specifically NLP), data analysis and visualization.
Programming languages: Python, JavaScript, SQL, R. Some familiarity with HTML & CSS.
Worked on backend systems for web-based research projects built using Django.
Defined schemas and set up PostgreSQL databases for collection of user provided speech data collected via Twilio IVR flows. Exposed collected data via RESTful APIs to power the frontends.
Developed data processing pipelines for transcription, timestamp generation and annotation of collected speech data.
Set up machine learning infrastructure using AWS Sagemaker for training and deployment of custom NLP models for real-time inference.
Developed a python module that generated a 24/7 live video feed which served as the main content for the projects, using FFmpeg and other image processing libraries.
Completed a 6-week training in data engineering best practices at the Lab for Social Machines at MIT.
Deployed a containerized application to a Kubernetes cluster that collected streaming data from Wikipedia’s recent changes API, using RabbitMQ as a message broker. Data was stored as compressed parquet files on S3.
Developed a novel topic modeling algorithm based on density based clustering of document embeddings, used to automatically discover topics in hundreds of hours of transcribed speech data.