# Prakash Kolluru > AI/ML Engineer | Agentic AI • LLM-Powered RAG • GraphML Recommenders | Embeddings + ANN at Scale | Boosted Ranking Accuracy (NDCG +20%) | Reduced Denials 27% | Accelerated Reimbursements +$5M Location: San Francisco Bay Area, United States Profile: https://flows.cv/prakashkolluru I’m a Computer Science professional with a Master’s degree from the University of California, Riverside, specializing in software development and applied machine learning. I bring hands-on expertise in building scalable web applications and intelligent systems, combining strong full-stack development skills (JavaScript, Python, Java, ReactJS, Spring Boot) with a deep interest in modern AI workflows, including Retrieval-Augmented Generation (RAG), large language models, and agentic AI solutions. My background spans cloud infrastructure on AWS, microservices architecture, and containerized deployments with Docker and Kubernetes. I’m particularly passionate about exploring how machine learning and generative AI can be seamlessly integrated into real-world products to improve user experiences, automate workflows, and drive innovation. I’m currently open to opportunities where I can contribute to designing, developing, and optimizing AI-powered applications and full-stack systems, whether that means fine-tuning models for specific domains, implementing vector search and retrieval pipelines, or engineering robust backend services to support intelligent features. If your team is working on innovative software solutions or applying AI to solve meaningful challenges, I’d love to connect and explore how I can help bring those ideas to life. Let’s connect to explore potential collaborations or exchange insights on advancements in technology and development. ## Work Experience ### Software Engineer (Machine Learning) @ CareDocs.ai Jan 2024 – Present | San Jose, California, United States Delivered the flagship clinician dashboard (React + FastAPI), powering real-time insights and documentation for 2,000+ professionals. Re-architected CosmosDB + Blob Storage pipelines, enabling sub-second retrieval on 10M+ fax records and saving $42K/year in infra costs. Engineered an MCP-driven system of AI agents for compliance validation, verification, billing code suggestion, and OCR automation — boosting documentation completeness to 98% and cutting claim rejections by 27%. Embedded hybrid ML + LLM workflows (LangChain + LangGraph) for retrieval, summarization, and compliance validation, improving claim FPY from 78% → 93% and scaling to 10K+ claims/day. Achieved 92% F1-score (BLEU=0.84) on coding accuracy and 96% OCR classification accuracy, ensuring production-grade reliability. Built modular FastAPI microservices deployed via AWS Lambda + API Gateway for session retrieval, PDF generation, and progress note validation, achieving 99% reliability at scale. Integrated FHIR-compliant micro services to sync patient/encounter data with PCC, achieving <200ms latency and removing manual billing payload entry. Established MLOps pipelines (Terraform + SageMaker + GitHub Actions) with drift detection and automated retraining, reducing model downtime and accelerating iteration cycles. ### Full Stack Engineer @ Vitals7 Jan 2023 – Jan 2024 | New York, United States Designed and delivered a full-stack vitals monitoring platform built with React and FastAPI (Python), deployed on AWS EKS for high availability and scale. Built a patient–provider recommendation engine, combining Collaborative Filtering, Gradient Boosted Decision Trees, and Deep Cross Networks (DCN v2) to model patient histories, provider specialties, and payer rules. Integrated LLaMA 3.1 and CodeLLaMA for prompt-driven clinical alerts and automated data pipelines, enabling clinicians to view live patient data with sub-100ms updates and reducing alert latency by 30%. Deployed embeddings in pgvector/Pinecone for fast retrieval and built ranking APIs in FastAPI with strict validation and async I/O. The pipeline scaled to 100K+ daily recommendations, delivering sub-200ms response times, improving NDCG@10 by 18%, and increasing appointment conversions by 25% in A/B tests. Implemented a Retrieval-Augmented Generation (RAG) pipeline with FAISS + LLaMA 3.1 on AWS S3 to provide context-aware triage classification and automate clinical audits. Engineered predictive health risk models in AWS SageMaker, leveraging Isolation Forest for anomaly detection, LightGBM for short-term risk prediction, and LSTMs for 24–48h time-series forecasting. These models processed continuous vitals data streams, reaching 92% precision in anomaly detection and 20% fewer false positives in deterioration forecasts. Developed multimodal AI-powered clinical workflows, embedding a WebRTC-based video consultation feature with real-time audio transcription via STT. The transcripts were passed into fine-tuned GPT-4o and LLaMA 3.1 models for summarization, and structured alert generation. ### Teaching Assistant @ University of California, Riverside Jan 2023 – Jan 2023 | Riverside, California, United States As a Teaching Assistant for CS141: Intermediate Data Structures and Algorithms, I took on various responsibilities : Enhanced student engagement by incorporating real-world examples, leading dynamic discussions to deepen understanding, and fostering critical thinking with challenging exams. Demonstrated leadership by managing a team of eight graders and a fellow teaching assistant, promoting a collaborative and supportive educational environment. I managed a diverse class of 150 students, maintaining high attendance and proactive involvement through regular office hours, encouraging participation, and offering targeted guidance to boost learning outcomes. I developed clear, fair evaluation rubrics for various assessments and received exceptional feedback. ### Data Engineer @ kipi.bi Jan 2020 – Jan 2022 | Hyderabad, Telangana, India Developed FastAPI-based microservices for metadata extraction, ETL control, and pipeline auditing, cutting manual analytics intervention by 70%. Integrated MuleSoft APIs with Python services for seamless enterprise data exchange. Deployed production-ready recommendation services handling 100K+ daily requests with sub-150ms p95 latency, validated through A/B testing on add-to-cart and bundle take rates. Automated dynamic reporting using Pandas + Jinja2, generating configurable Excel/PDF exports for C-level stakeholders, reducing reporting turnaround from hours to minutes. Provisioned cloud infrastructure with Terraform, automating AWS RDS, ECS, S3, and Secrets Manager with fine-grained IAM policies. Containerized services with Docker and orchestrated via ECS Fargate, achieving 99.99% uptime across production workloads. ## Education ### Master of Science - MS in Computer Engineering University of California, Riverside ## Contact & Social - LinkedIn: https://linkedin.com/in/prakash-kolluru --- Source: https://flows.cv/prakashkolluru JSON Resume: https://flows.cv/prakashkolluru/resume.json Last updated: 2026-03-29