Backend Services
* Led the design and implementation of an open-source BI & tool in Node.js, Kotlin and gRPC. This tool has completely replaced Metabase in our company and got 300+ stars on Github.
* Developed a Jupyter-based notebook service with customized kernel containers that can run Spark on Kubernetes.
* Crafted a podcast hosting platform using Node.js, with a serverless ffmpeg-based audio transcoding service.
* Engineered a distributed scraping system in Python and Node.js, leveraging RabbitMQ, Kafka and MongoDB. At peak times it handled 150k+ scraping rules concurrently.
* Engaged in ETH Shanghai Hackathon and won the Best Decentralized Identity Project.
Data Engineering
* Refined our ETL platform which is written in Java and Scala utilizing Airflow and Livy to schedule tasks on the Spark cluster on Amazon EMR. It hosts 100 transformation tasks per day with complex topology.
* Developed a machine-learning model management platform in Python, which manages the model storage in Aliyun OSS and hosts them on Kubernetes.
* Developed several internal tools by Spring boot and React for real-time monitoring of data pipelines and ad-hoc queries.
* Achieved a significant reduction in downtime (from around 10 hrs to 30 mins monthly) for our ad-hoc query service.
Infrastructure
* Enhanced our CI/CD system and maintained our Dockerfiles and Helm charts.
* Developed some Kubernetes operators and CRDs in Golang to meet our business requirements.
* Engaged into the migration of our cluster from AWS to Aliyun.