# Chengxuan Cai > SDE at Newsbreak Location: Mountain View, California, United States Profile: https://flows.cv/chengxuan Currently working in NewsBreak. Previously worked in Tencent's fintech program co-invested by China International Capital Corporation(CICC) as data engineer from July 2021 to August 2023. The program aimed to deliver CICC’s top-notch wealth management service to Tencent’s 800 million wechat users. My work focused on solving data engineering problem with software approach. 1.Data Infrastructure maintenance and development: ∙ Maintained the operation of company’s private Hadoop-based distributed data cluster built on CentOS Linux operating system to ensure on-time, accurate delivery of TB-level data ∙ Built a metadata management system based on Apache Atlas to ease data warehouse management. Constructed customized python program to synchronize production cluster's metadata to Atlas service. Supported column-level data lineage searching that cover over 100,000 columns, accelerated table search speed by 5 times. ∙ Built Java and Python program to analyze lineage relationship between Flink tables with SQL analyzer to ease stream data management. ∙ Helped to migrate production data cluster to accommodate service growth . Participated in planning, execution, and data integrity testification of migration. Accelerated restarting data workflow computation on new server with my lineage management tool. 2.Data Processing and Data Warehouse building ∙ Supported company’s business intelligence, user personas, and data analysis service with codes written in Python, Java, and SQL. Utilized various open-source big-data framework including Hive, Spark, Flink, Impala, MySql to comply different data demand. 3.Data preparation for customized LLM model training and software development for LLM-based chatbot service ∙ Obtained and processed finance related data from various data sources including online encyclopedia, test material for finance certificate, finance products' description, and company’s database into more than 200K pairs of simulated dialogue for LLM fine-tuning ∙ Applied MVC(model, view, controller) pattern to build LLM-chatbot’s chat service using Python Flask framework to match users’ question with vector database ∙ Trained customized SIMCSE text2vec model using 340K sentences of finance dialog data augmented by large language model and dialog between users and our company’s consultant. The customized SIMCSE model improves the accuracy of matching users’ question to the correct pre-answered question in vector database from 70% to 95%. ## Work Experience ### Software Engineer @ NewsBreak Jan 2025 – Present | 美国 加利福尼亚 山景城 Software engineer in data group ### Software Engineer Intern @ TigerGraph Jan 2024 – Jan 2024 | 美国 德州 休斯顿 Working as software intern of infrastructure team. ### Software Engineer(Data) @ 金腾科技信息(深圳)有限公司 Jan 2021 – Jan 2023 | 中国 广东省 深圳 Worked as Data Engineer in Tencent's joint venture program, which collaborates with CICC, a leading investment bank in China. As a data engineer, I work on developing data wharehouse/ lake(based on Apache Iceberg) using tools like hive, spark and Apache Flink to ensure accuracy and availability of our company's data, and supports the business/data analysis group. My current job also involves reliability management of company's private hadoop/datalake cluster, metadata governance tool, and graph database. ### System Development Engineer @ Tencent Jan 2020 – Jan 2020 | 中国 广东省 深圳 ∙ Work in the joint venture group of Tencent and CICC (金腾科技) to help manage the data that support finance service of the company. ∙ Manage distributed transaction and user data on Hadoop/Hive based company server ∙ Help building Spark and Hive Based machine learning platform in company's newly built private HDFS cluster ∙ Build data visualization and distributed machine learning tools in Scala/Python to help analyzing data in server ## Education ### Master of Science - MS in Computer Science Rice University ### 硕士 in electrical engineering University of Southern California ### Bachelor of Science - BS in Master Degree-BS, Electrical Engineering University of Southern California ## Contact & Social - LinkedIn: https://linkedin.com/in/chengxuan-cai-26384512b --- Source: https://flows.cv/chengxuan JSON Resume: https://flows.cv/chengxuan/resume.json Last updated: 2026-04-05