AI/ML Engineer with 5+ years of experience designing, building, and deploying production-grade machine learning and deep learning systems.
Experience
2025 — Now
2025 — Now
California, United States
Contributed to the development and extension of GPU-accelerated Python inference services using FastAPI,supporting transformer-based models for internal reference and customer validation workloads.
• Worked on Retrieval-Augmented Generation (RAG) prototypes using LangChain for orchestration and
LlamaIndex for document ingestion, focused on internal benchmarking and customer proof-of-concept
demonstrations.
• Evaluated vector retrieval approaches using FAISS (GPU) for internal experimentation and supported
Pinecone-based setups exclusively for external customer PoCs, maintaining clear separation between tooling.
• Applied parameter-efficient fine-tuning techniques (LoRA / QLoRA) during controlled experimentation using Hugging Face Transformers, leveraging pre-trained checkpoints and existing GPU infrastructure.
• Supported data preprocessing and evaluation workflows using Pandas, NumPy, and SQL to enable structured experimentation, offline evaluation, and performance comparison across model variants.
• Assisted in optimizing inference workflows through ONNX export and collaboration on TensorRT
benchmarking, observing improvements in latency and throughput.
• Containerized ML services using Docker and collaborated with platform teams to deploy GPU workloads on Kubernetes, emphasizing scalability and efficient resource utilization.
• Used MLflow for experiment tracking and model version comparison within prototyping and benchmarkingpipelines to support reproducibility.
• Supported customer-facing validation workloads on AWS (EC2 GPU instances, EKS, S3, ECR), assisting with deployment verification, benchmarking, and technical demonstrations.
• Participated in internal model evaluation discussions, contributing analysis on LLM behavior, hallucination patterns, and grounding quality in RAG systems under senior guidance
2020 — 2023
2020 — 2023
India
• Developed and deployed end-to-end machine learning pipelines using Python, PyTorch, and TensorFlow for enterprise-facing applications.
• Built supervised learning models (classification and regression) on large, structured datasets, improving
prediction accuracy by 10–15% through feature engineering and iterative experimentation.
• Performed data preprocessing, feature engineering, and EDA using Pandas, NumPy, and Scikit-learn,
enabling stable and reusable training pipelines.
• Integrated ML models into production systems in collaboration with software engineering teams, following internal deployment and validation standards.
• Deployed and monitored models using Azure Machine Learning, leveraging cloud-based compute for
training and batch inference workloads.
• Conducted hyperparameter tuning and A/B testing to improve model performance, stability, and inference efficiency.
• Used Jupyter Notebooks and visualization tools to communicate experimental results and insights to
technical and business stakeholders.
• Worked within Agile/Scrum development processes, participating in sprint planning, backlog grooming, daily stand-ups, and retrospectives to deliver incremental ML features.
Education
Montclair State University
Master's degree
GITAM Deemed University