1. Worked on post processing module for optical character recognition(OCR) result of AT&T contracts and predicted the best replacement for misspellings using natural language processing knowledge
2. Conducted error analysis with big data tools such as Pyspark, Hadoop and SolrUI in 4 million image-extracted text documents and discovered two major patterns in the misspellings
3. Reviewed existing solution
4. Designed process for tokenization with different ways and correction with Noisy Channel Model
5. Implemented word corrector using Python from scratch tailored for AT&T contracts, which increased Word Recognition Rate by 16%
6. Created AT&T dictionary for NLP workers within the company