Staff Software Engineer with 8+ years building distributed infrastructure and AI agent systems at scale. Creator of a scalable Go autonomous agent platform with 1,860+ MCP tool endpoints, durable execution, state checkpointing, fault recovery, and sandboxed compute isolation serving 17 enterprise customers.
Experience
2025 — Now
Burlingame, California, United States
AI observability and evaluation platform for enterprise ML teams.
• Designed and built a large-scale Go autonomous agent platform (1,860+ MCP tool APIs across
72 modules) with sandboxed execution environments, state checkpointing, fault recovery, and
trust-boundary isolation – recognized at All-Hands as "most complex agent built at Galileo."
• Served as Incident Lead for 4 P0 production incidents (6,200+ stuck jobs, Redis
misconfigurations across 3 clusters, production API outages), resolving all within 24 hours
with documented root causes and permanent fixes.
• Implemented agent execution infrastructure and auth frameworks – OAuth-based RBAC across 17
customer environments, tiered trust boundaries, rate limiting and circuit breakers for 15+
external integrations, and OpenTelemetry metrics with 12 Grafana dashboards.
• Built a durable execution engine with 5-minute checkpointing, atomic state writes,
fault-tolerant consensus, and batch API optimization (50% cost reduction, 90% prompt cache
hit rate) – enabling 24-hour autonomous execution cycles with crash recovery across 23
concurrent agent workers.
• Led enterprise customer migration from v1 to v2 with a Go data-migration CLI,
ClickHouse/Postgres backup-restore, DNS cutover automation, and Grafana tracking – zero data
loss, zero unplanned downtime.
• Managed deployments for 17 enterprise customers including Fortune 20 and Fortune 50
companies across AWS, GCP, and Azure, driving CVE remediation, GPU model bundle
configuration, and TLS 1.3 hardening.
2023 — 2024
2023 — 2024
San Francisco, California, United States
AI platform for NLP and document understanding, serving defense and intelligence customers.
• Automated ML workflow deployments using GitHub Actions, transitioning 12+ production ML
deliverables from manual Kubernetes deployments to ArgoCD automated workflows, eliminating
deployment toil.
• Consolidated development infrastructure by 43% and parallelized ML workloads on GPU
instances, reducing build times by 80%.
• Developed an IaC platform with GitOps automation, reducing infrastructure fulfillment from
weeks to under 10 minutes (93% reduction), enabling self-service provisioning for ML research
and sales MVPs.
• Designed and developed PrimerCLI (Bash, Python, Go), a centralized developer toolkit
adopted by 100 engineers, reducing manual processes by 1,200 hours annually.
• Led the design and implementation of Kubernetes for the company's production migration to
Azure, achieving 46% completion and delivering $2.8M in cost savings.
2021 — 2023
San Francisco, California, United States
A space technology company providing SAR imagery and satellite solutions for commercial and
government use.
• Reduced build times by 80% by creating a unified toolkit that eliminated 8,000 lines of
redundant code, containerized CI processes, and consolidated GitLab pipeline modules.
• Overhauled cloud infrastructure using IaC, improving code delivery reliability by 30% and
building a centralized tools cluster that enabled local-parity development and testing
environments.
• Built a redundant container registry for production systems, preventing satellite imagery
pipeline outages during peak collection windows.
2020 — 2021
2020 — 2021
Cloud governance automation solutions specializing in Security-as-Code to prevent data
breaches.
• Built the company's first CI/CD pipeline using Terraform and IaC (300 commits), integrating
AWS CloudFormation and Azure deployments and enabling 1.5+ production deliveries per day.
• Reduced production errors by 50% through expanded end-to-end testing, frontend and unit
test coverage, and automated monitoring and alerting.
2018 — 2020
2018 — 2020
Mountain View, CA
• Built a speech model evaluation system adopted by 300 engineers and 80 PMs — comparing
quality data across usage history, forecasting performance, and shortening release cycles
through automated tooling.
• Developed automated Python tooling that cut 800 hours from data collection, using ML models
to detect user sentiment, map user journeys, and surface pain points.
• Sole designer of mouse-to-keyboard interaction for Google Assistant on web — filed for U.S.
patent.
• Managed GPU-focused cloud workloads and CI runners for speech model training and evaluation
pipelines.
• Led infrastructure migration enabling 100K+ Google employees to work remotely during
COVID-19 — maintaining communications, troubleshooting engineering environments, and adapting
internal tools for external access.
• Modernized ML build/release documentation for Assistant (5 years outdated, 2 tech stack
changes) — met with dozens of engineers to map processes for bleeding-edge ML release cycles.
• Volunteered for Google's AI Ethics advisory group (2019-2020), contributing to discussions
on responsible AI development, model interpretability, and evaluation standards.
Education
American River College
Associate's
California State University-Sacramento