# Forest Fang > AI · viz · d/acc Location: San Francisco, California, United States Profile: https://flows.cv/forest I am passionate about understanding large-scale datasets using machine learning and interactive visualizations. I have extensive experience tackling large scale data modeling and the associated distributed computing challenges using Scala/Java and the Hadoop stack. I also specialize in developing visualizations using D3.js, React and R/R Shiny. Visualizing large dataset requires high-throughput and low-latency compute to render the graphics in manageable size quickly. I have extensive experience working with large scale dataset on Hadoop using Apache Pig, Apache HIVE and now Apache Spark. I’m excited about Spark’s performance and elegant Scala programming interface. My current focus has been exploring better data science practices in the lens of functional programming, and how to effectively leverage responsive interactivity in data visualization. Specialties: Scala, R, JavaScript, D3.js, Java / Hadoop stack including Spark, Pig, HIVE, HDFS / Data Warehousing, Machine Learning, Data Science, Data Modeling ## Work Experience ### Staff Software Engineer @ Stripe Jan 2021 – Jan 2024 ### Data Science Engineer @ Stripe Jan 2018 – Jan 2021 | San Francisco Bay Area My team support data scientists and everyone at Stripe to practice data science. We believe in extending data scientists' skillset and capability more than making it easy to repeat the same-old-work. In addition to abstracting complexity, we spend a lot of time and effort to smooth out learning curve and build the right foundation so data scientists feel confident using advanced languages and frameworks, such as building dashboards in React or preparing analytical data in Spark and make "impossible" things possible. ### Machine Learning Engineer @ Airbnb Jan 2016 – Jan 2018 | San Francisco Bay Area Machine Learning and Data Analytics • Evolved Airbnb Home teams’ understanding, communication, and prioritization over supply and demand via work on market definition, market intelligence dashboard, and guest perceived availability. • Reconciled main Booking Probability model with Theoretical Elasticity for listing revenue forecasting model. • Designed Long Lead Day Pricing model pioneering Transfer Learning and Deep Learning. Iterated Booking Probability model using distributed GAM. Distributed Computing and Infrastructure • Optimized various distributed data pipeline with 10-100x speedup. Documented and shared optimization insights. • Developed Spark library to significantly reduce boilerplate code, user errors and boosting iteration speed on writing distributed ETL application in Airbnb. • Developed data normalization framework for external data harmonization. Data Science Enrichment and Partnership • Top contributor to internal R packages. • Established best practices for R package development, R dependency isolation in Airflow and other R infrastructure for iterations and deployment. • Served as engineering partner on Data Science Technology Council. • Advocate for internal R education. Designed and taught Data Visualization in R course. ### Associate @ BlackRock Jan 2013 – Jan 2016 Founding member of Advanced Data Analytics team within BlackRock's Financial Modeling Group. Primarily focus on large-scale data processing, modeling, and visualization using Apache Spark and D3.js. • Architected data warehousing and modeling pipeline for mortgage borrower level dataset (TB+ size) covering data onboarding, feature extraction, aggregation and modeling using Scala, Protobuf, Spark, and Parquet. Contributed bug fixes identified from the pipeline back to the Spark project. • Iterated on mortgage prepayment machine learning models using R and Spark MLLib. Models included k-Means, GLM, k-NN, Random Forest etc. • Authored novel data visualization for mortgage data (parallel coordinates, scatter plot matrix etc.), and model performance using R, D3.js, and Tableau. • Collaborated on high dimensional big data visualizer: binned aggregation using Spark and HBase; web app interface using Angular.js and D3.js. • Designed and developed a SparkR DSL package which dynamically bootstraps itself from Scala reflection using metaprogramming. • Developed R packages integrating enterprise environment and Hadoop platform with R. • Evangelized use of R Markdown and R Shiny for reproducible and interactive data science work. • Worked on Pig based ETL, analytics pipeline and Pig UDFs in Java/Scala. • Experienced in using Scala Macros to eliminate boilerplate code while maintaining static type safety and native performance. ### Intern Analyst @ BlackRock Jan 2012 – Jan 2012 Recruited to develop data visualization and reporting tools for financial modelling analytics. Worked on unifying data retrieval process and providing interactive reporting application. ### Consultant, Software Engineer @ International Livestock Research Institute Jan 2011 – Jan 2011 | Nairobi, Kenya Recruited to develop well-documented system with capability of seamless integration into existing infrastructure, enabling automatic updating of Index Based Livestock Insurance (IBLI) index, automatic projection of data and index into variously prescribed maps and graphs, and automatic dissemination of customized information to predetermined outlets. • Delivered automation system project, with expected completion time of two months and anticipated cost of $20,000, in only two weeks. • Wrote documentation of system for future reference by both developers and users. • Successfully leveraged C# for high-level, user-friendly operations including data capturing, integration, and projection. • Expertly utilized Matlab for speed-critical computations such as data filtering and index calculation. • Reduced report computation cycle from five hours to 10 minutes with Parallel Computing, Optimized Data Structure, Effectively Vectorized Matrix Operations, and good programming practice. ## Education ### Bachelor of Arts in Mathematics, Computer Science Cornell University Jan 2009 – Jan 2013 ### High School Cheshire Academy Jan 2007 – Jan 2009 ### Shanghai Nanyang Model High School Jan 2005 – Jan 2007 ## Contact & Social - LinkedIn: https://linkedin.com/in/forestfang - GitHub: https://github.com/saurfang - Website: https://medium.com/@saurfang/ --- Source: https://flows.cv/forest JSON Resume: https://flows.cv/forest/resume.json Last updated: 2026-03-22