## Major Valuable Projects in Career ## In Credit Karma, I implemented Capacity Data Pipeline to collect all kinds of data to improve capacity planning efficiency and accuracy which saved lots of engineering efforts. I also accomplished POC (Proof of Concept) to leverage Argo Rollouts in K8s for automated...
Experience
2021 — Now
2021 — Now
1) Capacity Forecasting : Use domain knowledge and machine learning models to forecast annual capacity demands for various GCP services and projects, including GCE, GKE, BQ, GCS, etc., and then align with FP&A for budget planning.
2) Peak Capacity Sizing : Analyze and estimate seasonal peak capacity demand for each service to prevent capacity issues.
3) Cost Efficiency : Identify opportunities for binpacking and rightsizing in GKE to achieve cost savings without impacting service SLA.
4) Site Issues : Follow up on site issues and performance troubleshooting to avoid unnecessary capacity additions.
5) Performance Optimization : Diagnose and optimize performance issues on services based on Java/Node.js/Typescript.
6) Stress Testing : Automate stress testing via Argo Rollouts in prod env without affecting the user experience.
7) Capacity Recommendation Engine : Leverage and train machine learning models to implement capacity recommendation engine to streamline capacity rightsizing for cost efficiency.
8) Capacity ETL Pipeline : Implement capacity ETL pipeline to collect capacity and performance planning dependent data from multiple sources at appropriate granularity, including K8s performance data, service traffic data, metadata, and more.
9) Capacity Knowledge Bot : Implement capacity knowledge bot using LLM on top of collected data by ETL pipeline to simplify and improve communication efficiency across teams.
2017 — 2021
2017 — 2021
Redwood City, CA
1) Define DR model and capacity utilization model for site high availability and scalability.
2) Long term and short team capacity planning and forecasting for the entire infrastructure.
3) Work with Supply Chain team to ensure infrastructure purchases are properly planned to meet site growth demand and timeline.
4) Analyze site health periodically and proactively to avoid abnormal non-linear capacity add.
5) Site performance issue analysis and troubleshooting to improve capacity utilization efficiency.
6) Capacity rightsizing for all services to avoid server abuse and save cost.
7) Build and operate capacity automation and analytics on multi-terabytes data sets for entire infrastructure performance data to ensure efficient infrastructure scaling in public and private clouds.
8) Capacity As A Service framework design and capacity data warehouse buildup to improve capacity planning efficiency.
9) Build data center migration model on top of performance and meta data in data warehouse to improve capacity utilization efficiency and minimize new server purchase.
10) Server SKU benchmark/stress testing for different application workloads and performance data analysis/comparison.
2015 — 2017
San Jose, CA
1) Capacity sizing estimation for all site components, including Oracle, SAN, NAS, Front End, middle tier, etc.
e.g. Estimate how many resource is needed for a new project/feature or for # of increased traffic.
2) Long term capacity planning and forecasting, to meet seasonal peak requirement of each year, we need to proactively estimate how many extra resources are required.
3) Performance dashboard development, including Oracle, SAN, NAS, Front End, middle tier, etc.
4) Capacity Self-Service tool development to automate and streamline capacity cost/impact estimation.
5) Correlation analysis between business and system performance metrics for more accurate forecasting.
6) Capacity workflow enhancement, to better serve our customers and reduce human efforts.
2007 — 2014
Shanghai, China
1) Capacity sizing estimation for all site components, including Oracle, SAN, NAS, Front End, middle tier, etc.
e.g. Estimate how many resource is needed for a new project/feature or for # of increased traffic.
2) Long term capacity planning and forecasting, to meet seasonal peak requirement of each year, we need to proactively estimate how many extra resources are required.
3) Performance dashboard development, including Oracle, SAN, NAS, Front End, middle tier, etc.
4) Capacity Self-Service tool development to automate and streamline capacity cost/impact estimation.
5) Correlation analysis between business and system performance metrics for more accurate forecasting.
6) Capacity workflow enhancement, to better serve our customers and reduce human efforts.
2006 — 2007
2006 — 2007
Shanghai, China
1) Oracle database management, troubleshooting, performance tuning and SQL review.
2) Oracle application server (OAS) management and deployment.
3) Testing, Staging, UAE and production environments deployment and maintenance.
4) Technical support to customers, including Oracle database monitoring, heavy SQL tuning, package deployment as well as weekly database health check report, etc.
5) Involved in projects, table structure design, PL/SQL & Shell scripts development, etc.