# Marshall Z. > Software Engineer @ Meta | ML Infrastructure & AI Platform Engineering | Building tooling for model serving, GPU capacity management & LLM-powered applications Location: Redwood City, California, United States Profile: https://flows.cv/marshall Software Engineer at Meta building AI/ML infrastructure that powers model serving and capacity management at scale. I work at the intersection of platform engineering and AI — from building self-service tooling that helps ML engineers manage GPU resources and model deployments, to designing LLM-powered conversational AI systems in immersive environments. My experience spans the full stack: React and TypeScript frontends, GraphQL and REST APIs, Golang and Python backends, and distributed data systems (Kafka, Cassandra, Redis). I've shipped products across ML infrastructure, generative AI, and developer tooling — always with a focus on measurable impact and cross-functional collaboration. Previously, I built data ingestion systems at InsightFinder (AIOps) and a graph database development toolkit at TigerGraph, giving me a strong foundation in distributed systems, data pipelines, and cloud-native engineering. I'm passionate about building developer-facing platforms that make complex AI systems accessible and manageable. ## Work Experience ### Software Engineer, ML Infrastructure — Inference Platform @ Meta Jan 2025 – Present | Menlo Park, CA Building internal platform tooling that enables ML engineers across Meta to manage GPU capacity, model serving quotas, and inference resources at scale. - Led the end-to-end design and execution of the Quota Wizard, a self-service tool that simplified how ML engineers allocate and transfer GPU serving capacity. Drove adoption from 25% to 70.4% of all quota moves through MLHub, and reduced what was previously a multi-step, hours-long manual process into a single automated request that completes in minutes. Hosted training sessions for GenAI and RecSys teams and connected directly with ML platform leaders to iterate on feedback. - Executed E2E Capacity Accounting across ML infrastructure in collaboration with cross-functional teams and MLHub. Introduced model-type as a new GPU tracking unit, unifying capacity queries across Facebook-Taxonomy and Model Type — giving platform customers a single, consistent view of their resource allocation. - Led the Solver Explainability project, building a self-service "Capacity Tab" that enables ML engineers to debug capacity allocation decisions independently. Collaborated with Capacity team engineers to refine upstream data pipelines and designed the UX for surfacing allocation logic, reducing dependency on manual support. - Designed and shipped a Quota Usage Dashboard serving 700+ unique users with 285 daily events. This surfaced previously undiscoverable model-level resource utilization data with aggregated filtering and hover-card model details, enabling ML engineers to make informed capacity decisions without navigating multiple disconnected data sources. - Built a Permission Management UX for the inference capacity platform, working directly with designers to tailor the experience for ML platform customers. Drove over 600 permission updates in H2 2025, enabling teams to manage access controls without filing manual requests. ### Software Engineer, Conversational AI & LLM — Horizon Worlds @ Meta Jan 2024 – Jan 2025 | San Francisco Bay Area Built the developer-facing APIs and AI systems that power LLM-driven NPC characters for creators in Meta Horizon Worlds. - Drove the creation of the `horizon/npc` TypeScript SDK, a new module offering 15 API methods that enable game creators to build LLM-powered conversational AI characters with custom game logic, dialogue control, and embodiment behaviors. Proposed and aligned the data structure for access control in the TS class with 4 stakeholders across 3 teams. - Led the AI Transparency launch — a cross-platform initiative required for legal and policy compliance of generative AI content. This included three interconnected systems: automated AI World tagging that programmatically labels AI-generated content, a policy acknowledgment NUX ensuring users consent to AI interaction terms, and an NPC Engagement Reception system built with Unity shaders in C# that visually guides users through 6 interaction stages with 4 distinct icons. - Built the initial LLM fallback system spanning WWW, C#, and TypeScript, ensuring graceful degradation when model inference fails or returns unexpected results. This was a critical unblock for the Shootball flagship game launch in H1 2024. - Led the youth safety strategy for AI-generated content, designing and coordinating access controls to exclude users under 13 from LLM-powered NPC interactions. This required alignment across multiple cross-functional teams including policy, legal, and platform safety. - Drove Scripted Avatars to broad creator adoption — updated avatar editing preview, VR creation flows, and configured roll-out GKs for 2P and MHCP access. - Reduced the HUR bug count by 40% as Bug POC, created the NPC Team Testing guidance doc, and served as Eng Excellence Champion maintaining test coverage above 70%. Drove AI Policy QED coverage and fixed critical issues for a Bobber Bay demo at Connect. ### Software Engineer @ InsightFinder Inc. Jan 2023 – Jan 2024 | Raleigh-Durham-Chapel Hill Area Feature owner of Service Map and Dashboard in InsightFinder SAAS application. • Developed data collector agent(Golang) to collect variable meric/log data from customer provided APIs, AWS S3 or Kubernetes. Employed goroutines to optimize agent efficiency, ensuring maximum data collection throughput. • Designed and implemented multiple data pipelines handling data ingestion from various sources. Implemented RESTful APIs for third-party integrations, and managed authentication using OAuth, JWT, and mTLS. • Utilized Cassandra as the data store for efficient and reliable data persistence, ensuring data durability and integrity. • Spearheaded efforts to enhance system-wide observability. Developed and integrated logic to collect data across different categories, allowing for comprehensive data aggregation, providing valuable insights for stakeholders. ### Full Stack Engineer @ TigerGraph Jan 2021 – Jan 2023 | Redwood City, CA GraphStudio, the graph database development toolkit, is a highly interactive application built with Angular, TypeScript, Gin, and Golang. It processes the user interaction input and transforms it into data models that the backend database understands. • Performed Spike with Postman for new features in the early developing stage and collaborated with upstream teams (core database, infrastructure) to architect RESTful APIs. • Utilized Test-driven development methodology. Added unit tests(Jasmine) and automated integration tests(Protractor). • Improved cloud CI/CD enabling release time 2 times faster than the original process. Integrated cloud-customized code/functions (TypeScript, Golang) into the main product using build flag and environment variables. • Implemented a new visualization library for ETL data transformation. Refactor the transformation of data model and rendering logic. Data Loading Error Report • Boosted the user data loading efficiency by 3 times with new UI, which consumes results from the kafka data pipeline. • Made a recursive HTTP request calls to the backend internal API every 10 seconds and utilized lock to handle multiple results of loading results from the distributed system. • Kept the results in the key-value store(ETCD) with cache and dynamically displayed error messages in the UI. Data Ingestion from Cloud Platforms • Simplified the data-in logic and reduced user data importing time by 80% with new UI workflow. Supported data ingestion from AWS S3, GCP(Google Cloud Platform) and ABS(Azure Blob Storage). • Collected basic login credentials in UI and transformed them into data models used in the backend(Golang) to create the internal kafka connector. Day to Day Work • Worked with designers and PMs to conduct user interviews and usability tests on customers/potential users using Figma. • Practiced the Agile Sprint development life cycle, and worked on the whole process of CI/CD pipeline development in the docker environment. ### Full Stack Engineer @ PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION INC TR Jan 2020 – Jan 2021 | Chapel Hill, NC • Participate in daily maintenance of Project Gutenberg production site and the construction of new Project Gutenberg site. The production site has over 6 million downloads per month. My work here is equal to full stack engineer. • Compile htaccess files on Centos7 server to manage developing PG sitelinks. Assist in new Centos8 server deployment using Ansible for Php and Apache installation and firewall set up. • Use SQLAlchemy and PostgreSQL to implement ORM in Libgutenberg greatly facilitating the maintenance. Research into and fix long existing abnormal in search result on PG developing site. The problem is caused by Genshi template used by PG search engine Autocat3. • Test and improve CSS and HTML5 design for Gutenberg site based on the survey from over 200 users. Improve taxonomy structure and redesign Bookshelf cataloging function using Django on new PG site. ### Research Assistant @ CAHL Lab at UC Berkeley Jan 2018 – Jan 2019 | Berkeley, CA • Scraped initial articulation data set from assit.org using Beautiful Soup and specified all edge cases. • Concatenated over thousand courses primary keys from two different datasets using Pandas and Regular Expression. • An achieved model generator that can generate 3 models (BOW, TF-IDF and DOC2VEC) using Python for different text input. • Proposed a new metric to model and comprehensively evaluated it with other 2 existing ones. • Generated 15 new articulation pairs proposal for each unarticulated courses (188 in total). ## Education ### Master of Science in Information Science- MSIS in Information Science/Studies The University of North Carolina at Chapel Hill Jan 2019 – Jan 2021 ### Bachelor's exchange student University of California, Berkeley Jan 2018 – Jan 2018 ### Bachelor of Management in Management Information Systems, General Tianjin University Jan 2015 – Jan 2019 ## Contact & Social - LinkedIn: https://linkedin.com/in/marshall-z-162118191 --- Source: https://flows.cv/marshall JSON Resume: https://flows.cv/marshall/resume.json Last updated: 2026-03-22