Experience
2025 — Now
United States
• Architected and scaled a production AI platform operationalizing foundation models, transforming LLM prototypes into mission-critical systems and improving reasoning accuracy by 32%.
• Owned the end-to-end AI lifecycle, standardizing distributed fine-tuning (LoRA/PEFT, RLHF), inference optimization, deployment, monitoring, and automated retraining.
• Engineered large-scale RAG infrastructure supporting semantic search across 10M+ documents, reducing retrieval latency by 45% while maintaining high precision.
• Optimized GPU and cloud utilization through orchestration, batching, quantization, and caching strategies, cutting compute costs by 34% without performance trade-offs.
• Built resilient, horizontally scalable inference services with zero-downtime deployments, robust versioning, and rollback safeguards.
• Established model observability standards implementing drift detection, hallucination tracking, and automated performance monitoring to mitigate degradation risks.
• Designed A/B experimentation frameworks to validate model upgrades, prompt strategies, and retrieval optimizations before production rollout.
• Partnered cross-functionally to align AI architecture with compliance, scalability, and Responsible AI governance standards (OWASP ML Top 10).
• Influenced AI platform roadmap and mentored engineers on distributed inference, MLOps automation, and production reliability best practices.
• Consolidated fragmented AI initiatives into a unified, scalable platform enabling secure, low-latency enterprise integration.
2022 — 2023
2022 — 2023
India
At TIAA, I worked on early enterprise adoption of LLM-powered NLP systems, primarily in healthcare and financial data domains. My focus was on building LLM-enabled text processing pipelines to extract insights from large volumes of unstructured data.
I developed domain-specific RAG solutions using ElasticSearch and vector embeddings, improving retrieval accuracy and downstream analytics performance. I optimized LLM fine-tuning and inference workflows using PyTorch and TensorFlow, enabling faster model iteration and reduced inference latency.
I deployed containerized LLM inference APIs using Docker and Flask and collaborated closely with analytics and product teams to integrate LLM outputs into decision-support and reporting systems. This role marked my transition from traditional NLP into practical, production-oriented LLM applications.
Tools: Hugging Face Transformers, LangChain, ElasticSearch, PyTorch, TensorFlow, Flask, Docker, Pandas, NumPy, Python
2021 — 2022
Hyderabad
At Cognizant, I built a strong foundation in NLP and transformer-based text analytics, working on large-scale data processing and model development pipelines. My work focused on pre-LLM transformer models, feature engineering, and evaluation workflows for text classification and analytics use cases.
I performed large-scale data preprocessing, automated SQL-driven data ingestion pipelines, and developed interactive Streamlit dashboards to visualize NLP model insights and performance metrics for stakeholders.
This role established my core NLP and transformer expertise, which later enabled a smooth transition into LLM-based systems in subsequent roles.
Tools: spaCy, NLTK, Scikit-learn, Pandas, NumPy, SQL, Streamlit, Python
Education
University of North Texas