Raleigh-Durham, North Carolina Area
• Evolutionary Algorithms for Hyper-parameter Optimization: Proposed and developed a hyperparameter optimization framework calledOIL(Optimized Inductive Learning), where evolutionary algorithms (e.g. Differential Evolution and NSGA-II) are integrated to supercharge software analytic tasks. OIL was tested on a wide range of optimizers with945 software projects data. Experimental results show that OIL improved the performance of effort estimation in terms of accuracy (won 16 out of 18 cases) and efficiency (reduced runtime from days to hours), respectively.
• Sequential Model Optimization for Software Effort Estimation:Designed a sequential model based method (a.k.a active learning method) named FLASH for the first time in software effort estimation domain to improve software effort estimators. With the constraints of specific computation costs, FLASH can efficiently find good configurations of machine learning methods (e.g. CART) for effort estimations. Overall it can improve the performance of software effort estimation tasks by11%on average in terms of accuracy.
• Project Health Prediction for Open-Source Software:Studied and investigated how predictive methods could help project health prediction. In the study,78,455 months of data from1,628 GitHub projects has been collected. A group of health indicators is defined based on project developing process and industrial domain knowledge. Furthermore, predictive models based on random forests, SVM, and CART have been proposed for the project health prediction. The preliminary results show that the process action on project level can be predicted to a high level of accuracy (10% error rate) with hyperparameter tuning on predicting methods.