•Built production LLM pipeline transforming therapy transcripts into clinical documentation through multi-stage generation—entity extraction, contextual retrieval, structured synthesis—reducing clinician documentation time by 90%.
•Engineered RAG system combining vector search and property graphs to ground LLM outputs in patient history, assessments, and EHR data, reducing hallucinations in safety-critical healthcare applications.
•Designed multi-provider inference infrastructure with intelligent routing across OpenAI, Fireworks, and Together AI, implementing streaming, dynamic fallbacks, and cost-aware distribution for thousands of daily requests.
•Developed prompt engineering framework with self-evaluation loops where models iteratively refine outputs against clinical accuracy and compliance criteria, reducing human review time by 60%.
•Built ML API service for NLP preprocessing (transcript segmentation, NER, sentiment analysis, behavioral markers) feeding downstream clinical analytics and knowledge graph pipelines.
Implemented structured generation with Zod/Pydantic validators to constrain LLM outputs to strict clinical schemas, ensuring type safety and seamless EHR integration.
•Created observability infrastructure tracking token usage, latency, and quality metrics across prompt templates, enabling data-driven optimization and A/B testing of generation strategies.
•Architected asynchronous processing with Redis-backed queues for compute-intensive AI workloads, plus secure HIPAA-compliant APIs with JWT auth across several endpoints.