Worked on applying NLP on Slack chat logs - expert finding, topic modeling, text classification. Improved search quality. Later on, built full-stack features such as web search and external note sharing.
Machine learning
===============
Expert finding:
•Tf-idf based model for recommending experts based on a query string
Topic modeling:
•Pipeline for cleaning text data, parsing out stop-words and first names, keeping only nouns
•Langid-based classifier for keeping only english-speaking teams
•Clustered the cleaned-up text messages using LDA in 50 clusters
* Visualized results using pyLDAvis
Text classification:
•Collected a labeled dataset for classifying messages into design/non-design
•Built additional filtering step for design-messages using a vocabulary from design-related Wikipedia pages
•Used fastText as an initial baseline (both with pretrained Glove word vectors and with training the word vectors). Got 0.83 f1-score
•Reproduced colleagues Bidirectional LSTM classifier using Keras. Got 0.86 f1-score
Technologies used: jupyter, pandas, fasttext (text classification), gensim (lda), keras (bidirectional LSTM)
Search quality
============
•Collected gold dataset of queries and expected positives for those queries
•Built search quality eval tool
•Iterated on the ElasticSearch model
•Improvement top3 recall from an initial 30% to 90% for that gold dataset
* Had follow-up call with an ElasticSearch outside expert consultant
Full-stack
========
•External note sharing (via public link)
•Note Reactions (similar to the Facebook ones)
Technologies used: Python, React, CSS, HTML