# Joe Redmond

> Software Engineer, LLM Infrastructure

Location: San Francisco, California, United States
Profile: https://flows.cv/joeredmond

I've spent my career making ML systems reliable in places where unreliable ones cause real harm — a cancer hospital, a clinical data company, and now a healthcare AI startup running millions of patient conversations.

My focus is LLM infrastructure: inference platforms, semantic observability, and the evaluation systems that tell you whether a model is behaving the way you think it is. I believe that question, "is the model doing what we think it's doing?" is one of the most important in AI right now.

## Work Experience
### Senior Software Engineer, LLM Infrastructure @ Hippocratic AI
Jan 2024 – Present | Palo Alto, California, United States
• Built an automated eval-driven release gating system for LLM serving changes, enforcing promotion/rollback decisions via side-by-side evaluation of quality, latency, and behavioral drift metrics under production load.

• Designed and built a real-time semantic observability service for LLM drift detection, comparing online logprob distributions against calibration baselines using non-parametric statistical methods to flag silent hardware faults and behavioral changes.

• Built a low-latency LLM inference platform using SGLang on Kubernetes, leveraging sticky routing to optimize KV-cache hit rate and prefix overlap across a multi-cloud GPU federation.

• Designed a forecast-driven proactive scaling control plane to align H200 GPU capacity with projected demand, virtually eliminating cold starts to meet conversational latency SLAs (400ms p90) — enabling the company to scale from 100k to 2M phone calls per month.

### Senior Software Engineer, Machine Learning @ Flatiron Health
Jan 2024 – Jan 2024 | San Francisco, California, United States
• Parallelized PyTorch inference runtime by decoupling tokenization from model execution, improving GPU utilization and increasing throughput by 25% ($100k/month savings).

• Scaled BERT-based NER inference from thousands to billions of clinical notes by redesigning pipelines for distributed execution using Spark.

### Software Engineer, Machine Learning @ Flatiron Health
Jan 2022 – Jan 2024 | San Francisco, California, United States
• Replaced ad-hoc manual inference runs with production-grade Airflow pipelines, enabling reliable reprocessing of billions of clinical notes with full auditability.

• Optimized database snapshot infrastructure using delta-based updates and columnar indexing, enabling 1000x faster reads and supporting hundreds of concurrent queries.

### Data Scientist @ Memorial Sloan Kettering Cancer Center
Jan 2019 – Jan 2022 | New York, NY
Trained and deployed a surgery duration prediction model into Epic EHR, improving scheduling accuracy by 17% MAE over the baseline and saving hundreds of nurse hours per week — system remains in production.

### TA, Artificial Intelligence (COMS 4701) @ Columbia University
Jan 2021 – Jan 2021 | New York, New York, United States
Teaching Assistant for Artificial Intelligence class under Dr. Ansaf Salleb-Aouissi (co-taught by Dr. Tony Dear)
• Taught ML concepts (SVMs, random forests) and search algorithms (heuristic, adversarial, backtracking) in Python during weekly office hours.
• Graded student work, both coding and conceptual.


## Education
### B.S.E. in Biological Engineering (CBE), minor in Computer Science (PAC)
Princeton University

### M.S. in Computer Science, Machine Learning concentration
Columbia University


## Contact & Social
- LinkedIn: https://linkedin.com/in/joearedmond

---
Source: https://flows.cv/joeredmond
JSON Resume: https://flows.cv/joeredmond/resume.json
Last updated: 2026-04-05