Experience

OnehouseSoftware Engineer

2023 — Now

Sunnyvale, California, United States

Building transactional solution for data lake workflows!

FacebookSoftware Engineer

2018 — 2023

United States

1. End to end experience on query engine internal, e.g., from grammar to iterator execution;

2. Data storage life cycle management, from ingestion to retention; (the hardest problem for in memory databases)

Rocket FuelSoftware Engineer

2013 — 2018

Redwood City, CA

What I have done or am doing:

1. Be responsible for the entire life cycle of the web service processing product feeds for all advertisers using dynamic creative in RF;

2. Be responsible for the ETL pipelines and the high-performing web service that provides ad inventory insights for advertisers;

3. Be responsible for stable and high performance cluster-cluster replication for data backup;

4. Built and be maintaining Hive to MySQL/Vertica data pipelines for data analysts;

5. Enable and tune Tez engine for Hive queries, instead of MR engine;

6. Developed and maintaining the system level test framework for Hive;

7. Analyze users' Hive queries submitted to RFI clusters, search for ineffectiveness patterns, modify Hive to automatically optimize when these patterns are encountered;

8. Trouble shoot for any questions Hive users encounter during their work.

Kent State UniversityResearch Assistant

2008 — 2013

Kent State University

Department of Computer Science, Research Assistant

Uncertain network structure clustering

This project aims to cluster uncertain network structure from the information theoretical perspective - cluster purity, and size balance. Algorithms are implemented in C++ on Linux. This work is published in ICDM12.

Highly Cohesive Group Discovery for Large Social Networks

This project aims to discover highly cohesive groups in social networks, such as colleagues in the company, members of Special Interest Group (SIG), or friends. These programs are implemented in C++ on Linux. This work is published in KDD11.

Trust Measurement for Large Social Networks

The goal of this project was to efficiently measure the potential trust level for any two people in the social networks. This program was implemented in C++ on Linux. This work is published in PVLDB11.

Large Social Network Summarization

Large social networks (thousands of thousands of vertices) are hard to visualize and understand. This project was launched to facilitate the storage and visualization of large network structures. In this project, I implemented the network summarization algorithm using bi-clique graphs in C++ on Linux.

Pattern Summarization for Large Transactional Database

The data analysis for large transactional database always focuses on frequent itemsets mining. In this project, I was responsible for implementing the summarization algorithm in C++ on Linux. This work is published in KDD09.

Rocket Fuel Inc.Research Scientist Intern

2012 — 2012

Redwood City, CA

In this intern I enforced Hive semantic analyzer and tested the accuracy and efficiency of Hive Index.

Education

Kent State University

PHD

University of Electronic Science and Technology of China

MS

University of Electronic Science and Technology of China

Experience

Education

PHD

MS

BS