Tech Part:
* Designed and developed application Timbr, a large-scale Meteor application that turns an arbitrary web page into structure JSON data; plug-and-play APIs and interfaces for users without programming experience (i.e., automatic web crawler/scraper)
* Used Vue.js and D3.js on the front-end component of Timbr to visualize/collect user’s action toward target websites, provide view components and dynamically visualize the JSON data generated
* Used Node and Electron on the back-end component of Timbr to render pages and mimic user’s actions to crawl/sample web pages
* Deployed Timbr on both AWS and local servers to crawl/sample millions of web pages from sites such as eBay, Amazon, GitHub
Research Part:
* Focus on Markov Chain Monte Carlo(Random Walk) method for real world web data mining, sampling from hidden database of online social networks, graph theory and web information retrieval
* Make aggregate estimation over big picture of the datasets of online social network based on the samples we capture, e.g., Total number of the users in Facebook.
* Work on speeding up about 50 percent of convergence rate of the state-of-art random walk sampling method on both Graphical models and Bayesian Hierarchical Models
* Using revised Timbr showing the on-the-fly sampling and estimation process for random walk methods on real world websites, like Amazon and eBay