# Joe Redmond > Software Engineer, LLM Infrastructure Location: San Francisco, California, United States Profile: https://flows.cv/joeredmond I've spent my career making ML systems reliable in places where unreliable ones cause real harm — a cancer hospital, a clinical data company, and now a healthcare AI startup running millions of patient conversations. My focus is LLM infrastructure: inference platforms, semantic observability, and the evaluation systems that tell you whether a model is behaving the way you think it is. I believe that question, "is the model doing what we think it's doing?" is one of the most important in AI right now. ## Work Experience ### Senior Software Engineer, LLM Infrastructure @ Hippocratic AI Jan 2024 – Present | Palo Alto, California, United States • Built an automated eval-driven release gating system for LLM serving changes, enforcing promotion/rollback decisions via side-by-side evaluation of quality, latency, and behavioral drift metrics under production load. • Designed and built a real-time semantic observability service for LLM drift detection, comparing online logprob distributions against calibration baselines using non-parametric statistical methods to flag silent hardware faults and behavioral changes. • Built a low-latency LLM inference platform using SGLang on Kubernetes, leveraging sticky routing to optimize KV-cache hit rate and prefix overlap across a multi-cloud GPU federation. • Designed a forecast-driven proactive scaling control plane to align H200 GPU capacity with projected demand, virtually eliminating cold starts to meet conversational latency SLAs (400ms p90) — enabling the company to scale from 100k to 2M phone calls per month. ### Senior Software Engineer, Machine Learning @ Flatiron Health Jan 2024 – Jan 2024 | San Francisco, California, United States • Parallelized PyTorch inference runtime by decoupling tokenization from model execution, improving GPU utilization and increasing throughput by 25% ($100k/month savings). • Scaled BERT-based NER inference from thousands to billions of clinical notes by redesigning pipelines for distributed execution using Spark. ### Software Engineer, Machine Learning @ Flatiron Health Jan 2022 – Jan 2024 | San Francisco, California, United States • Replaced ad-hoc manual inference runs with production-grade Airflow pipelines, enabling reliable reprocessing of billions of clinical notes with full auditability. • Optimized database snapshot infrastructure using delta-based updates and columnar indexing, enabling 1000x faster reads and supporting hundreds of concurrent queries. ### Data Scientist @ Memorial Sloan Kettering Cancer Center Jan 2019 – Jan 2022 | New York, NY Trained and deployed a surgery duration prediction model into Epic EHR, improving scheduling accuracy by 17% MAE over the baseline and saving hundreds of nurse hours per week — system remains in production. ### TA, Artificial Intelligence (COMS 4701) @ Columbia University Jan 2021 – Jan 2021 | New York, New York, United States Teaching Assistant for Artificial Intelligence class under Dr. Ansaf Salleb-Aouissi (co-taught by Dr. Tony Dear) • Taught ML concepts (SVMs, random forests) and search algorithms (heuristic, adversarial, backtracking) in Python during weekly office hours. • Graded student work, both coding and conceptual. ## Education ### B.S.E. in Biological Engineering (CBE), minor in Computer Science (PAC) Princeton University ### M.S. in Computer Science, Machine Learning concentration Columbia University ## Contact & Social - LinkedIn: https://linkedin.com/in/joearedmond --- Source: https://flows.cv/joeredmond JSON Resume: https://flows.cv/joeredmond/resume.json Last updated: 2026-04-05