I've spent my career making ML systems reliable in places where unreliable ones cause real harm — a cancer hospital, a clinical data company, and now a healthcare AI startup running millions of patient conversations.

Experience

Hippocratic AISenior Software Engineer, LLM Infrastructure

2024 — Now

Palo Alto, California, United States

Built an automated eval-driven release gating system for LLM serving changes, enforcing promotion/rollback decisions via side-by-side evaluation of quality, latency, and behavioral drift metrics under production load.

Designed and built a real-time semantic observability service for LLM drift detection, comparing online logprob distributions against calibration baselines using non-parametric statistical methods to flag silent hardware faults and behavioral changes.

Built a low-latency LLM inference platform using SGLang on Kubernetes, leveraging sticky routing to optimize KV-cache hit rate and prefix overlap across a multi-cloud GPU federation.

Designed a forecast-driven proactive scaling control plane to align H200 GPU capacity with projected demand, virtually eliminating cold starts to meet conversational latency SLAs (400ms p90) — enabling the company to scale from 100k to 2M phone calls per month.

Flatiron HealthSenior Software Engineer, Machine Learning

2024 — 2024

San Francisco, California, United States

Parallelized PyTorch inference runtime by decoupling tokenization from model execution, improving GPU utilization and increasing throughput by 25% ($100k/month savings).

Scaled BERT-based NER inference from thousands to billions of clinical notes by redesigning pipelines for distributed execution using Spark.

Flatiron HealthSoftware Engineer, Machine Learning

2022 — 2024

San Francisco, California, United States

Replaced ad-hoc manual inference runs with production-grade Airflow pipelines, enabling reliable reprocessing of billions of clinical notes with full auditability.

Optimized database snapshot infrastructure using delta-based updates and columnar indexing, enabling 1000x faster reads and supporting hundreds of concurrent queries.

Memorial Sloan Kettering Cancer CenterData Scientist

2019 — 2022

New York, NY

Trained and deployed a surgery duration prediction model into Epic EHR, improving scheduling accuracy by 17% MAE over the baseline and saving hundreds of nurse hours per week — system remains in production.

Columbia UniversityTA, Artificial Intelligence (COMS 4701)

2021 — 2021

New York, New York, United States

Teaching Assistant for Artificial Intelligence class under Dr. Ansaf Salleb-Aouissi (co-taught by Dr. Tony Dear)

Taught ML concepts (SVMs, random forests) and search algorithms (heuristic, adversarial, backtracking) in Python during weekly office hours.

Graded student work, both coding and conceptual.

Education

Princeton University

B.S.E.

Columbia University

Experience

Education

B.S.E.

M.S.