AI-native Product Data Scientist owning measurement + evaluation loops for Wearables GenAI/CoreUX—driving phone-replacement adoption, task success, and retention across 10+ product teams.
▸ Defined Phone Replacement Rate as the org-level north star for phone-free high-value tasks (media, capture, comms, navigation); built event taxonomy + logging spec and launch governance, raising coverage 0%→90% for consistent cross-feature measurement. Holdout analysis showed users reaching ≥3 phone-free successful tasks have +5pp D7 retention since activation, informing roadmap priorities.
▸ Built an LLM-judge labeling framework for 15 voice-initiated actions (~50K/week), generating success/failure + failure taxonomy with QC via 3% spot-check + adjudication. Published a recurring quality report as the measurement standard for roadmap decisions, driving root-cause fixes (routing, vocabulary) and boosting task success +20pp.
▸ Led analytics for LiveAI (camera-based multimodal assistant); ran use-case mining + retention analysis across ~3K daily sessions to surface repeat-use scenarios and drive PMF improvements for a major 2025 GTM campaign. Defined latency + reliability metrics and built a monitoring dashboard; insights drove a 2-month launch delay to reach 99% reliability readiness.
▸ Designed pillar-level experimentation governance (hypotheses, success/guardrail metrics, readout standards) across 10 CoreUX & Inputs engineering teams, reducing coordination overhead and preventing cherry-picking. Drove launch decisions across Wearables: 10 launches and 5 pivots.