Experience
2023 — Now
2023 — Now
Sunnyvale, California, United States
Building transactional solution for data lake workflows!
2018 — 2023
2018 — 2023
United States
1. End to end experience on query engine internal, e.g., from grammar to iterator execution;
2. Data storage life cycle management, from ingestion to retention; (the hardest problem for in memory databases)
2013 — 2018
2013 — 2018
Redwood City, CA
What I have done or am doing:
1. Be responsible for the entire life cycle of the web service processing product feeds for all advertisers using dynamic creative in RF;
2. Be responsible for the ETL pipelines and the high-performing web service that provides ad inventory insights for advertisers;
3. Be responsible for stable and high performance cluster-cluster replication for data backup;
4. Built and be maintaining Hive to MySQL/Vertica data pipelines for data analysts;
5. Enable and tune Tez engine for Hive queries, instead of MR engine;
6. Developed and maintaining the system level test framework for Hive;
7. Analyze users' Hive queries submitted to RFI clusters, search for ineffectiveness patterns, modify Hive to automatically optimize when these patterns are encountered;
8. Trouble shoot for any questions Hive users encounter during their work.
2008 — 2013
2008 — 2013
Kent State University
Department of Computer Science, Research Assistant
• Uncertain network structure clustering
This project aims to cluster uncertain network structure from the information theoretical perspective - cluster purity, and size balance. Algorithms are implemented in C++ on Linux. This work is published in ICDM12.
• Highly Cohesive Group Discovery for Large Social Networks
This project aims to discover highly cohesive groups in social networks, such as colleagues in the company, members of Special Interest Group (SIG), or friends. These programs are implemented in C++ on Linux. This work is published in KDD11.
• Trust Measurement for Large Social Networks
The goal of this project was to efficiently measure the potential trust level for any two people in the social networks. This program was implemented in C++ on Linux. This work is published in PVLDB11.
• Large Social Network Summarization
Large social networks (thousands of thousands of vertices) are hard to visualize and understand. This project was launched to facilitate the storage and visualization of large network structures. In this project, I implemented the network summarization algorithm using bi-clique graphs in C++ on Linux.
• Pattern Summarization for Large Transactional Database
The data analysis for large transactional database always focuses on frequent itemsets mining. In this project, I was responsible for implementing the summarization algorithm in C++ on Linux. This work is published in KDD09.
2012 — 2012
2012 — 2012
Redwood City, CA
In this intern I enforced Hive semantic analyzer and tested the accuracy and efficiency of Hive Index.
Education
Kent State University
PHD
University of Electronic Science and Technology of China
MS
University of Electronic Science and Technology of China