• Architected a scalable GenAI enterprise knowledge platform leveraging LLMs, RAG pipelines, and LangChain on Oracle Cloud Infrastructure, reducing knowledge retrieval time by 60% for 8,000+ global users.
• Led development of a multi-agent AI orchestration system using LangGraph and LlamaIndex, enabling automated SQL generation and intelligent report synthesis, resulting in 4x improvement in analyst productivity.
• Fine-tuned LLaMA 3 models using LoRA and QLoRA on 50K+ domain-specific documents, achieving 91% answer relevance, and deployed via FastAPI and AWS Lambda with sub-120ms end-to-end latency.
• Optimized LLM inference pipelines using NVIDIA CUDA, PyTorch quantization, and Azure Kubernetes Service (AKS), reducing serving latency by 42% while supporting 250K+ daily requests at 99.9% uptime.
• Designed and implemented MLOps and production deployment standards using Docker, Kubernetes, MLflow, and CI/CD pipelines on GCP Vertex AI, reducing release cycle time by 50% across 15+ production-grade ML systems.