# Catherine Shen

> Software Engineer at Plaid

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/catherineshen

Passionate in software engineering and proficient in Java and Python programming Currently working on several Cloud-based data applications. With work experience in machine learning data engineering and software engineering. 

▪ Language: Java, Python, Scala
▪ Data Engineering: MySQL, Cassandra, Hadoop, MapReduce, Spark, Pig, Hive, Kafka
▪ Tools and Platform: AWS, GCP, Git, Bash, Docker, Django, Flask, Dash
▪ Front End: Javascript, jQuery, HTML&CSS
▪ Data Visualization: Tableau, PowerBI, D3.js

## Work Experience
### Software Engineer @ Plaid
Jan 2024 – Present | San Francisco Bay Area
Storage Infrastructure
Design, implement, and maintain robust distributed storage systems, optimizing scalable database solutions like TiDB and MongoDB for large-scale data operations

### Software Engineer @ Opendoor
Jan 2021 – Jan 2024 | San Francisco Bay Area
Data platform / Data Infrastructure / Observability / DevOps 
Change Data Capture / DBT / Data Quality Framework / Airflow / Kubernetes

### Senior Software Engineer @ Palo Alto Networks
Jan 2020 – Jan 2021 | San Francisco Bay Area
Building Data infrastructure and Big Data platform
• Backend: Java Spring, Kafka, Kubernetes, GCP
• Pipeline: Spark, Python, Airflow, Prometheus

### Software Engineer @ Earnin
Jan 2019 – Jan 2020 | San Francisco Bay Area
Infrastructure
Building end to end Data science platform
• AWS Kinesis, Spark, Lambda, DynamoDB, Kubeflow, Kubernetes, Airflow, Jenkins

### Alumni Consultant @ Insight Data Science
Jan 2019 – Jan 2020
Mentor data engineer fellows on their insight projects.

### Data Engineering Fellow @ Insight Data Science
Jan 2019 – Jan 2019 | San Francisco Bay Area
Implemented a batch data processing platform using HDFS, Spark to analyze 3TB GitHub event data for users to find social influencers within GitHub network 
• AWS, Spark, HDFS, S3, Airflow, Flask

### Graduate Teaching Assistant @ University of Maryland
Jan 2018 – Jan 2018 | Washington D.C. Metro Area
- Designed 2 labs and mentored graduate students for Big Data Course
• Cloud computing Lab: AWS Lambda, Route53, DynamoDB, SageMaker and S3
• Apache Spark Lab: Introduction to Apache Spark, SparkML, Spark Streaming

### Research Assistant @ University of Maryland
Jan 2017 – Jan 2018 | Washington D.C. Metro Area
- Designed Human Affect Analytics Pipeline on AWS
Built data pipelines that collect, process, and compute emotion analysis using OpenCV in python, developed deep-learning models on automatic human emotion detection using OpenCV and Tensorflow

- Dockerized ML models and deployed model with AWS lambda, S3 and EC2

- Wrote a python package for automating social network mining process using asynchronous programming and distributed web scraping 

- Implemented sentiment analysis using Scikit-learn, spaCy and StatsModels

### Data pipeline Engineer @ Fuchun Oriental Real Estate Investment.
Jan 2015 – Jan 2017 | Guangzhou,China
- Data Integration, Data warehouse, ETL pipelines
• Python, SQL, SSIS, MS SQL Server


## Education
### Master of Science in Business Statistics in Data Science & Artificial Intelligence
University of Maryland

### Information System & Statistics
UCLA

### Data Engineering on Google Cloud Platform Specialization in Cloud Engineering
Google pour les pros

### Full Stack Web Developer Nanodegree in Computer Science
Udacity

### Bachelor’s Degree in Economics, Information System
Guangdong University of Foreign Studies


## Contact & Social
- LinkedIn: https://linkedin.com/in/chuqiao-catherine-shen
- Website: https://catherine-shen.medium.com/

---
Source: https://flows.cv/catherineshen
JSON Resume: https://flows.cv/catherineshen/resume.json
Last updated: 2026-03-23