Traffic Quality Pattern Detection. Analyzing web log to detect bot patterns. Trying bunch of features like click stream patterns. Using hadoop pig to parse and filter data, and R to analyze and visualize results.
Using: Clustering, R
Tag Recommendation. Generating new tags for business listings by using mahout recommendation component (user based and item based recommendation). Also use fp growth to detect frequently associated tag patterns.
Using: Mahout, lucene
SEO Linear Modeling. Building linear model to score content value. Find the hidden features which are important to SEO performance
Using: R, linear regression
Matching Classification Modeling. Removing duplicate business listings by building matching model., tune models and increase precision.
Using: hadoop, pig, lucene, weka, decision tree, svm, logistic regression.