Traced critical request path among 18 microservices in a distributed system, created Grafana Dashboards for 50+ gRPC method performance, and drove a company-wide initiative for CUJ-based SLO adaptation
•
Developed availability and latency SLOs using in-house SLO tooling for the Reddit site, adding metrics, Grafana dashboards, and alerting rules based on Google SRE best practices to help engineering teams for better incident response
•
Identified key SLIs, and SLOs and instrumented Prometheus into Reddit’s service catalog to enhance its observability, then analyzed performance and increased its data freshness by 33%
•
Analyzed Kubernetes instance lifecycles to uncover the fundamental origins of GraphQL 5xx error
Managed software development life cycle for Reddit's in-house service catalog, incorporating Spotify's open-source service Backstage and consolidating data from GitHub, Slack, Sentry, and other sources into a unified, trusted system used by 500+ engineers
•
Enrich the service directory with metadata stored in AWS S3 with a Python script
•
Created reusable React components to display metadata in frontend in TypeScript,
increasing micro-services observability and accountability
Education
Cornell University
Master of Engineering - MEng
Cornell University
Bachelor's degree
High School Affiliated to Nanjing Normal University