Founding member of Advanced Data Analytics team within BlackRock's Financial Modeling Group. Primarily focus on large-scale data processing, modeling, and visualization using Apache Spark and D3.js.
• Architected data warehousing and modeling pipeline for mortgage borrower level dataset (TB+ size) covering data onboarding, feature extraction, aggregation and modeling using Scala, Protobuf, Spark, and Parquet. Contributed bug fixes identified from the pipeline back to the Spark project.
• Iterated on mortgage prepayment machine learning models using R and Spark MLLib. Models included k-Means, GLM, k-NN, Random Forest etc.
• Authored novel data visualization for mortgage data (parallel coordinates, scatter plot matrix etc.), and model performance using R, D3.js, and Tableau.
• Collaborated on high dimensional big data visualizer: binned aggregation using Spark and HBase; web app interface using Angular.js and D3.js.
• Designed and developed a SparkR DSL package which dynamically bootstraps itself from Scala reflection using metaprogramming.
• Developed R packages integrating enterprise environment and Hadoop platform with R.
• Evangelized use of R Markdown and R Shiny for reproducible and interactive data science work.
• Worked on Pig based ETL, analytics pipeline and Pig UDFs in Java/Scala.
• Experienced in using Scala Macros to eliminate boilerplate code while maintaining static type safety and native performance.