# Chunxu Tang > P.h.D, Staff Software Engineer @ LinkedIn, PrestoDB Committer Location: San Francisco Bay Area, United States Profile: https://flows.cv/chunxu Staff-level software engineer and researcher specializing in large-scale data and AI infrastructure, with deep expertise in distributed systems, query processing, and ML-powered platforms at production scale. Currently leading and contributing to key components of LinkedIn’s AI infrastructure, focusing on model serving, vector-based graph analytics, and large-scale retrieval systems. Long-time open-source contributor with a track record of leading critical components in widely deployed data platforms. Recipient of the IEEE IC2E Best Industry Track Paper Award for industry-impactful ML-driven data systems (https://arxiv.org/abs/2204.05529). ## Work Experience ### Staff Software Engineer @ LinkedIn Jan 2024 – Present • Spark tuning: Led the performance optimization of Spark-based data manipulation pipelines for search index building. Reduced the sales search data pipeline’s execution time from 7 hours to 3 hours • Managed model serving platform: Worked on resource attribution and auto-scaling of model serving services. Saved >$1.5M opex cost. • GenAI for search: Worked on LinkedIn's embedding-based retrieval (EBR) within a centralized search service. ### PrestoDB Committer @ Presto Foundation Jan 2022 – Present Developer and maintainer of • Presto router (https://github.com/prestodb/presto/tree/master/presto-router) • Presto-Iceberg connector (https://github.com/prestodb/presto/tree/master/presto-iceberg) • Presto ML-based query predictor (https://github.com/prestodb/presto-query-predictor) ### Staff Research Scientist - Distributed System @ Alluxio, Inc. Jan 2023 – Jan 2024 | San Francisco Bay Area • Identified traffic patterns and proposed cache strategies for data analytics (SQL) and machine learning workloads. Our talk at Data + AI Summit 2023: https://youtu.be/wgr5Kdqa52Y • Led a unified cloud-native data access solution with Alluxio for end-to-end machine learning pipelines, which overcomes I/O challenges and improves GPU utilization to 90%+ in enterprise-grade ML workloads. - Led the collaboration with Uber on the adoption of caching to improve data analytics (Presto) & machine learning training (Ray) performance and cost efficiency. ### Senior Software Engineer @ Twitter Jan 2019 – Jan 2023 Tech lead of Twitter's Presto and Zeppelin, helping evolve Twitter's SQL federation system into a world-class large-scale system that processes ~10 PB of data daily. Also led and contributed to other cross-team or cross-functional projects, spanning a wide range of data systems, including BigQuery, Druid, and Spark/Neo4j for graph analytics. Presto • Led Twitter's Presto federation (10+ Presto clusters with 3000+ nodes, processing ~10 PBs of data daily). • Drove the project of creating an end-to-end machine learning pipeline learning from request logs to forecast resource usages (92%+ accuracy) of SQL queries of Presto and storing data exploration jobs in JupyterLab. • Contributed to the router and scheduler to improve system performance (P99 query queued time decreased by 90%). BigQuery • Worked on the low-code ML project to advocate using BigQuery ML to simplify ML pipelines on multiple components such as model training and feature store. Helped re-build Twitter's notification ML models and reduce the dislike rate by 2+%. Zeppelin • Led Twitter's Zeppelin notebook service (~400 weekly active users). • Drove the migration of on-premises Twitter's Zeppelin, to Kubernetes in the cloud (GCP) in a move-and-improve strategy. Druid • Worked on a unified ingestion web service for Apache Druid, a real-time analytics database, to manage data ingestion jobs. Graph Analytics • Contributed to the company-wide graph analytics project and evaluated a hybrid architectural design with Spark and Neo4j for next-generation large-scale graph analytics. ### Software Developer Internship @ Futurewei Technologies, Inc. Jan 2018 – Jan 2018 | Santa Clara, CA • Participated in designing Huawei large scale mobile-edge-cloud synchronization and collaboration data platform. • Implemented an MBaaS (Mobile backend as a service) layer on top of heterogeneous data stores (table store, object store, etc.) to enforce the strong consistency of CURD operations on composite data objects. ### Software Developer Internship @ Google Summer of Code Jan 2017 – Jan 2017 Google Summer of Code at Jitsi • Developed face recognition/tracking feature for Jitsi-Meet, a simple and scalable video conferencing web application, implemented in React and Redux. • Integrated Google Calendar feature in Jitsi-Meet Electron application. ## Education ### Doctor of Philosophy (Ph.D.) in Computer Engineering Syracuse University Jan 2015 – Jan 2019 ### Master's degree in Computer Engineering Syracuse University Jan 2013 – Jan 2015 ### Bachelor of Engineering (B.E.) Xiamen University Jan 2009 – Jan 2013 ### Bachelor of Economics (Minor) Xiamen University Jan 2009 – Jan 2013 ## Contact & Social - LinkedIn: https://linkedin.com/in/chunxu-tang --- Source: https://flows.cv/chunxu JSON Resume: https://flows.cv/chunxu/resume.json Last updated: 2026-03-22