# Chunxu Tang

> P.h.D, Staff Software Engineer @ LinkedIn, PrestoDB Committer

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/chunxu

Staff-level software engineer and researcher specializing in large-scale data and AI infrastructure, with deep expertise in distributed systems, query processing, and ML-powered platforms at production scale.
Currently leading and contributing to key components of LinkedIn’s AI infrastructure, focusing on model serving, vector-based graph analytics, and large-scale retrieval systems.
Long-time open-source contributor with a track record of leading critical components in widely deployed data platforms.
Recipient of the IEEE IC2E Best Industry Track Paper Award for industry-impactful ML-driven data systems
(https://arxiv.org/abs/2204.05529).

## Work Experience
### Staff Software Engineer @ LinkedIn
Jan 2024 – Present
• Spark tuning: Led the performance optimization of Spark-based data manipulation pipelines for search index building. Reduced the sales search data pipeline’s execution time from 7 hours to 3 hours
• Managed model serving platform: Worked on resource attribution and auto-scaling of model serving services. Saved >$1.5M opex cost.
• GenAI for search: Worked on LinkedIn's embedding-based retrieval (EBR) within a centralized search service.

### PrestoDB Committer @ Presto Foundation
Jan 2022 – Present
Developer and maintainer of
• Presto router (https://github.com/prestodb/presto/tree/master/presto-router)
• Presto-Iceberg connector (https://github.com/prestodb/presto/tree/master/presto-iceberg)
• Presto ML-based query predictor (https://github.com/prestodb/presto-query-predictor)

### Staff Research Scientist - Distributed System @ Alluxio, Inc.
Jan 2023 – Jan 2024 | San Francisco Bay Area
• Identified traffic patterns and proposed cache strategies for data analytics (SQL) and machine learning workloads. Our talk at Data + AI Summit 2023: https://youtu.be/wgr5Kdqa52Y
• Led a unified cloud-native data access solution with Alluxio for end-to-end machine learning pipelines, which overcomes I/O challenges and improves GPU utilization to 90%+ in enterprise-grade ML workloads.
- Led the collaboration with Uber on the adoption of caching to improve data analytics (Presto) & machine learning training (Ray) performance and cost efficiency.

### Senior Software Engineer @ Twitter
Jan 2019 – Jan 2023
Tech lead of Twitter's Presto and Zeppelin, helping evolve Twitter's SQL federation system into a world-class large-scale system that processes ~10 PB of data daily. Also led and contributed to other cross-team or cross-functional projects, spanning a wide range of data systems, including BigQuery, Druid, and Spark/Neo4j for graph analytics.

Presto
• Led Twitter's Presto federation (10+ Presto clusters with 3000+ nodes, processing ~10 PBs of data daily).
• Drove the project of creating an end-to-end machine learning pipeline learning from request logs to forecast resource usages (92%+ accuracy) of SQL queries of Presto and storing data exploration jobs in JupyterLab. 
• Contributed to the router and scheduler to improve system performance (P99 query queued time decreased by 90%).

BigQuery
• Worked on the low-code ML project to advocate using BigQuery ML to simplify ML pipelines on multiple components such as model training and feature store. Helped re-build Twitter's notification ML models and reduce the dislike rate by 2+%.

Zeppelin
• Led Twitter's Zeppelin notebook service (~400 weekly active users).
• Drove the migration of on-premises Twitter's Zeppelin, to Kubernetes in the cloud (GCP) in a move-and-improve strategy.

Druid
• Worked on a unified ingestion web service for Apache Druid, a real-time analytics database, to manage data ingestion jobs.

Graph Analytics
• Contributed to the company-wide graph analytics project and evaluated a hybrid architectural design with Spark and Neo4j for next-generation large-scale graph analytics.

### Software Developer Internship @ Futurewei Technologies, Inc.
Jan 2018 – Jan 2018 | Santa Clara, CA
• Participated in designing Huawei large scale mobile-edge-cloud synchronization and collaboration data platform.
• Implemented an MBaaS (Mobile backend as a service) layer on top of heterogeneous data stores (table store, object store, etc.) to enforce the strong consistency of CURD operations on composite data objects.

### Software Developer Internship @ Google Summer of Code
Jan 2017 – Jan 2017
Google Summer of Code at Jitsi

• Developed face recognition/tracking feature for Jitsi-Meet, a simple and scalable video conferencing web application, implemented in React and Redux.
• Integrated Google Calendar feature in Jitsi-Meet Electron application.


## Education
### Doctor of Philosophy (Ph.D.) in Computer Engineering
Syracuse University
Jan 2015 – Jan 2019

### Master's degree in Computer Engineering
Syracuse University
Jan 2013 – Jan 2015

### Bachelor of Engineering (B.E.)
Xiamen University
Jan 2009 – Jan 2013

### Bachelor of Economics (Minor)
Xiamen University
Jan 2009 – Jan 2013


## Contact & Social
- LinkedIn: https://linkedin.com/in/chunxu-tang

---
Source: https://flows.cv/chunxu
JSON Resume: https://flows.cv/chunxu/resume.json
Last updated: 2026-03-22