Staff-level software engineer and researcher specializing in large-scale data and AI infrastructure, with deep expertise in distributed systems, query processing, and ML-powered platforms at production scale.

2024 — NowLinkedInStaff Software Engineer

2024 — Now

Spark tuning: Led the performance optimization of Spark-based data manipulation pipelines for search index building. Reduced the sales search data pipeline’s execution time from 7 hours to 3 hours

Managed model serving platform: Worked on resource attribution and auto-scaling of model serving services. Saved >$1.5M opex cost.

GenAI for search: Worked on LinkedIn's embedding-based retrieval (EBR) within a centralized search service.

2022 — NowPresto FoundationPrestoDB Committer

2022 — Now

Developer and maintainer of

Presto router (https://github.com/prestodb/presto/tree/master/presto-router)

Presto-Iceberg connector (https://github.com/prestodb/presto/tree/master/presto-iceberg)

Presto ML-based query predictor (https://github.com/prestodb/presto-query-predictor)

2023 — 2024Alluxio, Inc.Staff Research Scientist - Distributed System

2023 — 2024

San Francisco Bay Area

Identified traffic patterns and proposed cache strategies for data analytics (SQL) and machine learning workloads. Our talk at Data + AI Summit 2023: https://youtu.be/wgr5Kdqa52Y

Led a unified cloud-native data access solution with Alluxio for end-to-end machine learning pipelines, which overcomes I/O challenges and improves GPU utilization to 90%+ in enterprise-grade ML workloads.

Led the collaboration with Uber on the adoption of caching to improve data analytics (Presto) & machine learning training (Ray) performance and cost efficiency.

2019 — 2023TwitterSenior Software Engineer

2019 — 2023

Tech lead of Twitter's Presto and Zeppelin, helping evolve Twitter's SQL federation system into a world-class large-scale system that processes ~10 PB of data daily. Also led and contributed to other cross-team or cross-functional projects, spanning a wide range of data systems, including BigQuery, Druid, and Spark/Neo4j for graph analytics.

Presto

Led Twitter's Presto federation (10+ Presto clusters with 3000+ nodes, processing ~10 PBs of data daily).

Drove the project of creating an end-to-end machine learning pipeline learning from request logs to forecast resource usages (92%+ accuracy) of SQL queries of Presto and storing data exploration jobs in JupyterLab.

Contributed to the router and scheduler to improve system performance (P99 query queued time decreased by 90%).

BigQuery

Worked on the low-code ML project to advocate using BigQuery ML to simplify ML pipelines on multiple components such as model training and feature store. Helped re-build Twitter's notification ML models and reduce the dislike rate by 2+%.

Zeppelin

Led Twitter's Zeppelin notebook service (~400 weekly active users).

Drove the migration of on-premises Twitter's Zeppelin, to Kubernetes in the cloud (GCP) in a move-and-improve strategy.

Druid

Worked on a unified ingestion web service for Apache Druid, a real-time analytics database, to manage data ingestion jobs.

Graph Analytics

Contributed to the company-wide graph analytics project and evaluated a hybrid architectural design with Spark and Neo4j for next-generation large-scale graph analytics.

2018 — 2018Futurewei Technologies, Inc.Software Developer Internship

2018 — 2018

Santa Clara, CA

Participated in designing Huawei large scale mobile-edge-cloud synchronization and collaboration data platform.

Implemented an MBaaS (Mobile backend as a service) layer on top of heterogeneous data stores (table store, object store, etc.) to enforce the strong consistency of CURD operations on composite data objects.

Education

2015 — 2019

Syracuse University

Doctor of Philosophy (Ph.D.)

2015 — 2019

2013 — 2015

Syracuse University

Master's degree

2013 — 2015

2009 — 2013

Xiamen University

Bachelor of Engineering (B.E.)

2009 — 2013

Xiamen University

Bachelor of Economics (Minor)

2009 — 2013

Experience+

Education

Doctor of Philosophy (Ph.D.)

Master's degree

Bachelor of Engineering (B.E.)

Bachelor of Economics (Minor)