# Punitha Pokala > AI ENGINEER | Machine Learning | AI Platform & MLOps | LLM Systems | RAG | Distributed Training Location: United States, United States Profile: https://flows.cv/punitha I am a AI Engineer specializing in architecting scalable AI platforms that operationalize machine learning and foundation models in real-world production environments. My focus is not just on building models, but on engineering reliable, observable, and high-performance AI systems that deliver measurable business impact. Over the past several years, I have worked across NLP, LLM systems, and AI platform engineering — designing end-to-end ML pipelines spanning data ingestion, feature engineering, distributed training, fine-tuning (LoRA/PEFT, RLHF), inference optimization, deployment, monitoring, and automated retraining workflows. I have led the development of Retrieval-Augmented Generation (RAG) systems supporting large-scale semantic search, reducing latency by 45% and improving contextual reasoning accuracy by 32% in production systems. My experience includes optimizing GPU and cloud infrastructure efficiency, implementing model observability and drift detection frameworks, and deploying containerized AI services with zero-downtime reliability. I am deeply interested in distributed systems, scalable inference, and performance-driven AI architecture. Earlier in my career, I built transformer-based NLP models and fine-tuned large language models using PyTorch and TensorFlow, improving model efficiency by 37% and retrieval precision by 33%. This foundation in applied machine learning evolved into broader AI platform ownership and MLOps leadership. I am particularly passionate about building AI systems that move beyond experimentation — into secure, governed, enterprise-grade deployment aligned with Responsible AI principles. I’m always open to connecting with teams working on scalable ML systems, foundation model infrastructure, AI platform engineering, and next-generation AI applications. ## Work Experience ### Senior AI Engineer (AI Platform & MLOps) @ UnitedHealth Group Jan 2025 – Present | United States • Architected and scaled a production AI platform operationalizing foundation models, transforming LLM prototypes into mission-critical systems and improving reasoning accuracy by 32%. • Owned the end-to-end AI lifecycle, standardizing distributed fine-tuning (LoRA/PEFT, RLHF), inference optimization, deployment, monitoring, and automated retraining. • Engineered large-scale RAG infrastructure supporting semantic search across 10M+ documents, reducing retrieval latency by 45% while maintaining high precision. • Optimized GPU and cloud utilization through orchestration, batching, quantization, and caching strategies, cutting compute costs by 34% without performance trade-offs. • Built resilient, horizontally scalable inference services with zero-downtime deployments, robust versioning, and rollback safeguards. • Established model observability standards implementing drift detection, hallucination tracking, and automated performance monitoring to mitigate degradation risks. • Designed A/B experimentation frameworks to validate model upgrades, prompt strategies, and retrieval optimizations before production rollout. • Partnered cross-functionally to align AI architecture with compliance, scalability, and Responsible AI governance standards (OWASP ML Top 10). • Influenced AI platform roadmap and mentored engineers on distributed inference, MLOps automation, and production reliability best practices. • Consolidated fragmented AI initiatives into a unified, scalable platform enabling secure, low-latency enterprise integration. ### AI/ML Engineer – NLP & LLM Systems @ TIAA Jan 2022 – Jan 2023 | India At TIAA, I worked on early enterprise adoption of LLM-powered NLP systems, primarily in healthcare and financial data domains. My focus was on building LLM-enabled text processing pipelines to extract insights from large volumes of unstructured data. I developed domain-specific RAG solutions using ElasticSearch and vector embeddings, improving retrieval accuracy and downstream analytics performance. I optimized LLM fine-tuning and inference workflows using PyTorch and TensorFlow, enabling faster model iteration and reduced inference latency. I deployed containerized LLM inference APIs using Docker and Flask and collaborated closely with analytics and product teams to integrate LLM outputs into decision-support and reporting systems. This role marked my transition from traditional NLP into practical, production-oriented LLM applications. Tools: Hugging Face Transformers, LangChain, ElasticSearch, PyTorch, TensorFlow, Flask, Docker, Pandas, NumPy, Python ### Data Scientist (NLP & Transformer Models) @ Cognizant Jan 2021 – Jan 2022 | Hyderabad At Cognizant, I built a strong foundation in NLP and transformer-based text analytics, working on large-scale data processing and model development pipelines. My work focused on pre-LLM transformer models, feature engineering, and evaluation workflows for text classification and analytics use cases. I performed large-scale data preprocessing, automated SQL-driven data ingestion pipelines, and developed interactive Streamlit dashboards to visualize NLP model insights and performance metrics for stakeholders. This role established my core NLP and transformer expertise, which later enabled a smooth transition into LLM-based systems in subsequent roles. Tools: spaCy, NLTK, Scikit-learn, Pandas, NumPy, SQL, Streamlit, Python ## Education ### Data Science University of North Texas ## Contact & Social - LinkedIn: https://linkedin.com/in/pokala-punitha2001 - Portfolio: https://shorturl.at/tYFws - GitHub: https://github.com/punithapokala01-lgtm - Portfolio: https://portfolio-animator-11.preview.emergentagent.com --- Source: https://flows.cv/punitha JSON Resume: https://flows.cv/punitha/resume.json Last updated: 2026-04-18