Burlingame, California, United States
Train & evaluate performance of Meta Ray-Ban voice assistant across all stages of develop-
ment, for all (10+) non-US English markets •Collaborate across teams to build adversarial/red-
teaming datasets evaluating pre- and post- launch model safety •Design & maintain pipelines for
large, multi-lingual datasets, ensuring live & synthetic data is transformed, graded, & evaluated
• Compute model evaluation metrics & identify areas of improvement using Python, SQL, regex,
and AI tools •Conduct product experiments & iteratively evaluate model quality to ensure com-
pliance with integrity standards & release deadlines • Communicate model trends and actionable
next steps to engineers, annotators, managers, cross-functional partners, & other stakeholders