Experience
2019 — Now
2019 — Now
San Francisco Bay Area
ML Model Observability & Infrastructure
• Led building foundational ML model observability platform and metadata ingestion systems, enabling actionable insights across ML model lifecycle stages and operational health monitoring at LinkedIn scale
• Built ML tracking sampling system, implementing adaptive sampling strategies across billions of daily events based on real-time model traffic patterns and feature usage metrics to ensure capture of critical ML model metadata and operational health data that power feature statistics and model performance analytics
Data Management (DataHub)
• Spearheaded data entity lifecycle management initiatives across LinkedIn's big data ecosystem, establishing robust data governance frameworks and compliance standards while delivering comprehensive dataset observability capabilities
• Drove strategic lineage initiatives and Led cross functional team to establish scalable lineage graph foundations that connect 2+ million datasets across 20+ heterogeneous data platforms and orchestrate 1+ million data jobs
• Served as early engineer for LinkedIn DataHub, architected and delivered offline dataset(e.g. HDFS, Hive, Iceberg) metadata ingestion and serving infrastructure, enabling seamless data compliance, granular access control, comprehensive lineage tracking, and automated ownership management for millions of datasets.
• Led GraphQL federation implementation to automatically onboard service metadata (REST/gRPC), reducing manual onboarding effort from weeks of engineering work to hours and enabling seamless service discovery across the enterprise
• Pioneered integration of dataset metadata with enterprise AI-powered search platform (Glean), establishing dataset metadata as core components of LinkedIn's knowledge discovery ecosystem and enabling seamless data asset discoverability for data scientists and engineers organization-wide
2017 — 2019
2017 — 2019
San Francisco Bay Area
• Develop Bot Conversation Platform(based on EC2, S3, DynamoDB, Redis Cache, Messaging Queue) to power popular Bot channels(such as Google Assistant and Alexa) and provide shared functionalities such as NLP(Intent/Entity Recognition), User Travel Management(Cancel/Confirm/Review), LOB Search(hotel/flight/car), Transaction(hotel booking/carrental), Notification(SMS/Push Notification), etc.
• Develop and launch first Expedia action for Google Assistant, giving customers the ability, through voice commands, to search for LOB, book hotels, manage trips, explore city attractions, etc, and providing rich UI responses, such as Carousel Card, Suggestion Chips, and Receipt Card.
• Develop Expedia Alexa Skill to allows user to speak with Alexa-enabled device to ask travel related questions, explore popular activities of destinations, manage booked trips, plan vacations, and received information of attraction via SMS message when requested.
• Develop and integrate voice-bot authentication system with Expedia GSS model to allow user to link Expedia account(via Facebook login and Gmail login) with Alexa Skill and Google Action.
• Develop Bot Logging Service(based on AWS Lambda Function, CloudWatch and DynamoDB) to collect training data, from user interaction and bot state migration, to feed NLP model and conversation smart router.
2015 — 2017
2015 — 2017
Santa Clara
• Develop cloud-based services to facilitate test scheduling, execution and analysis, and notify service consumer of unexpected failures or system errors via email.
• Develop Shared Memory Manager, which manages memory segments shared by threads or processes and provides thread-safe interfaces to write/read/add/remove/get the segments.
• Improve and optimize Memory Reservation Pool, a Kernel memory cache for shared-memory segments used by Oracle database SGA to facilitate quick restart, and for cloud instances to improve restart/resume performance.
Education
University of Southern California
Master's degree
Huazhong University of Science and Technology