Greater New York City Area
• Developed a data pipeline to aggregate the number of occurrences of words from web pages so that advertisers can determine which topics trend over time
• Designed a web interface using Flask to allow users to query the frequency of over 370,000 English dictionary words, and submit requests for non-dictionary word queries using Airflow
• Utilized AWS and Spark to process about 1TB of web pages from the Common Crawl dataset
• Partitioned data from 2018 through the use of PostgreSQL and TimescaleDB to optimize read/write functions