• Developed ETL pipeline with 30+ dependencies to process collected raw data and store in Hbase using HiveSQL. Developed 4 DM tables with different aggregation degrees to satisfy needs. Separated DW data from business logic. Supported the data needs of 200+ colleagues from data science team and operation team.
• Maintained event tracking ETL pipeline and report 10 bugs to front-end developers to ensure data quality.
• Improved the query efficiency about 50% by implementing database sharding strategy on a 2000Gb database.
• Collaborated with team members to implement a regression model to evaluate GMV increment of sales promotion tools using SQL and python and visualized results with matplotlib. Deployed the model to perform daily update.