# Anil Barla > AI/ML Engineer @ NVIDIA | Generative AI, RAG & LLM Optimization | GPU Inference | MLOps | AWS/Azure Location: San Francisco, California, United States Profile: https://flows.cv/anilbarla AI/ML Engineer with 5+ years of experience designing, building, and deploying production-grade machine learning and deep learning systems. Currently at NVIDIA, contributing to GPU-accelerated inference services, transformer-based models, and Retrieval-Augmented Generation (RAG) pipelines for high-performance AI workloads. Previously worked at Microsoft, where I developed and deployed end-to-end ML pipelines using PyTorch and TensorFlow, improving model accuracy and building scalable solutions for enterprise applications. Strong expertise in Python, LLMs, FastAPI, MLflow, Docker, Kubernetes, AWS, and Azure. Passionate about building scalable, efficient, and responsible AI systems that bridge research and real-world impact. ## Work Experience ### AI/ML Engineer @ NVIDIA Jan 2025 – Present | California, United States Contributed to the development and extension of GPU-accelerated Python inference services using FastAPI,supporting transformer-based models for internal reference and customer validation workloads. • Worked on Retrieval-Augmented Generation (RAG) prototypes using LangChain for orchestration and LlamaIndex for document ingestion, focused on internal benchmarking and customer proof-of-concept demonstrations. • Evaluated vector retrieval approaches using FAISS (GPU) for internal experimentation and supported Pinecone-based setups exclusively for external customer PoCs, maintaining clear separation between tooling. • Applied parameter-efficient fine-tuning techniques (LoRA / QLoRA) during controlled experimentation using Hugging Face Transformers, leveraging pre-trained checkpoints and existing GPU infrastructure. • Supported data preprocessing and evaluation workflows using Pandas, NumPy, and SQL to enable structured experimentation, offline evaluation, and performance comparison across model variants. • Assisted in optimizing inference workflows through ONNX export and collaboration on TensorRT benchmarking, observing improvements in latency and throughput. • Containerized ML services using Docker and collaborated with platform teams to deploy GPU workloads on Kubernetes, emphasizing scalability and efficient resource utilization. • Used MLflow for experiment tracking and model version comparison within prototyping and benchmarkingpipelines to support reproducibility. • Supported customer-facing validation workloads on AWS (EC2 GPU instances, EKS, S3, ECR), assisting with deployment verification, benchmarking, and technical demonstrations. • Participated in internal model evaluation discussions, contributing analysis on LLM behavior, hallucination patterns, and grounding quality in RAG systems under senior guidance ### Machine Learning Engineer @ Microsoft Jan 2020 – Jan 2023 | India • Developed and deployed end-to-end machine learning pipelines using Python, PyTorch, and TensorFlow for enterprise-facing applications. • Built supervised learning models (classification and regression) on large, structured datasets, improving prediction accuracy by 10–15% through feature engineering and iterative experimentation. • Performed data preprocessing, feature engineering, and EDA using Pandas, NumPy, and Scikit-learn, enabling stable and reusable training pipelines. • Integrated ML models into production systems in collaboration with software engineering teams, following internal deployment and validation standards. • Deployed and monitored models using Azure Machine Learning, leveraging cloud-based compute for training and batch inference workloads. • Conducted hyperparameter tuning and A/B testing to improve model performance, stability, and inference efficiency. • Used Jupyter Notebooks and visualization tools to communicate experimental results and insights to technical and business stakeholders. • Worked within Agile/Scrum development processes, participating in sprint planning, backlog grooming, daily stand-ups, and retrospectives to deliver incremental ML features. ## Education ### Master's degree in Computer Science Montclair State University ### Bachelor of Technology - BTech in Computer Science GITAM Deemed University ## Contact & Social - LinkedIn: https://linkedin.com/in/anilbarla283 --- Source: https://flows.cv/anilbarla JSON Resume: https://flows.cv/anilbarla/resume.json Last updated: 2026-04-17