Experience: 12+ years of backend engineering, ranging from offline data pipelines, to search verticals, and most recently to ads model serving infra (an area of RecSys inference) at Meta.
Experience
2022 — Now
2022 — Now
Menlo Park, California, United States
Jun 2023 - present, Ads ML Serving Platform
Scope: Retrieval stage GPU-powered ads models serving.
I have a track record of driving major advancements and solving complex engineering challenges in ads model serving infra. Examples:
* Design and implemented AED (ad embedding on device). Notable complexities: GPU-based hash table to allow O(10-100k) concurrent lookup, HBM/DRAM hybrid storage for ad embeddings to allow O(100M) ads in memory, vectorized memcpy that is 16x faster than CUDA native memcpy. Impact: alleviated memory bottleneck and CPU bottleneck for major retrieval stage ads models and hence unblocked their launches. AED has become a foundational building block for ads retrieval stage GPU model serving. Without it, several ongoing high-profile projects, such as TorchRetrieval, Neutron Star on AMD, Genre 3.0, and Stage Consolidation, would be blocked.
* Major contributor for Neutron Star launch (first GPU-based model serving system in ads serving infra). I served as predictor service TL and solved a few hard problems (eg, invented a novel solution to derisk Manifold IO throttling for NS model serving; mitigated fallback SLA blocker)
* The AMD model serving launch in ads retrieval stage model serving. Served as TL and personally solved arguably the hardest problem, which was the CPU bottleneck on AMD.
Jan 2022 - May 2023, Metrics Computation Infra
2018 — 2022
2018 — 2022
Sunnyvale, CA
2019-04 to 2022-01: Search, LSS (Tech stack: lucene/java/scala/hadoop/spark/hive/kafka/gradle/git)
2018/10 to 2019-03: Salary Team, LTS (Tech stack: java/oracle/hadoop/kafka/couchbase/gradle/git/svn)
2016 — 2018
2016 — 2018
Sunnyvale, CA
Team Salary.
• Lead developer for Salary Search project. Designed and built the backend for the salary search vertical based on LinkedIn Galene framework, including the design of query expansion and ranking algorithms.
• Lead developer for company-level insights page.
• Optimized bulk data processing speed for LinkedIn Salary backend (latency reduced from over one second to less than 100 milliseconds)
2014 — 2016
2014 — 2016
Sunnyvale, CA
• Build Yahoo's centralized audience data pipeline, which delivers user data and analytics of the entire Yahoo user traffic (tech stack: hadoop/pig/hive/oozie)
• Maintain and develop large-scale systems that ensure the safety and quality of ads served by Yahoo (tech stack: Java/Spring/Hibernate/MySQL/Oracle/Groovy/C++)
• Enabled geo-testing of ads by dynamically routing the network traffic through external proxies (using Apache Traffic Server)
• Design and built a creative review server using Netty.
• Design and built TNS Offline Service that provides various reporting services and manages scheduled offline jobs. (Jetty/J2EE).
2014 — 2014
2014 — 2014
• Key member of the Trust and Safety (TnS) team at Yahoo, with responsibilities to maintain and develop systems that ensure the safety and quality of ads served by Yahoo (tech stack: Java/Spring/Hibernate/MySQL/Oracle/Groovy/C++)
• Created a performance benchmark driver in Groovy with the following features:
• --- Support use of concurrent threads to drive tests
• --- Able to collect responses of asynchronous services (JMS-based)
• --- Able to drive HTTP-based and JMS-based services.
This tool was used to benchmark the majority of applications owned by TnS
• Carried out performance benchmark on most TnS applications, and developed analysis tools in Python to analyze the benchmark results.
• Was a key member for developing and maintaining the continuous delivery pipeline at Yahoo (tech stack: Java/Groovy/Jenkins)
• Architected and implemented a highly scalable, end-to-end test framework for importing open-source Jenkins at Yahoo (tech stack: Java/Groovy/Selenium-WebDriver/PhantomJS/Cucumber). The performance of the framework greatly exceeds the last-generation framework at Yahoo due to its scalability. Key features include:
• --- Support multiple browsers (Firefox, PhantomJS)
• --- Execute Cucumber scenarios in parallel with customizable process number
• --- Able to select what tests to run using annotations. (For example, `gradle test -PtagFilter=smoke+ui` will only run tests annotated with @Smoke AND @Ui).
Education
University of Colorado Boulder
Doctor of Philosophy (Ph.D.)
University of Science and Technology of China