Experience
2025 — Now
2025 — Now
San Francisco, California
Working on large-scale LLM systems powering search, finance, and enterprise AI applications. Focused on agentic AI, RAG pipelines, and high-performance inference.
● Fine-tuned and deployed Sonar LLM (LLaMA 3.3–70B) using PyTorch and DeepSpeed on AWS, achieving 1,200+ tokens/sec throughput on large-scale inference infrastructure
● Built end-to-end RAG pipelines using LangGraph, FAISS, and Pinecone, improving factual accuracy to 92%+ for financial and enterprise use cases
● Integrated LLM into multi-agent workflows using CrewAI, enabling SEC-compliant insights and increasing user engagement by 27%
● Engineered high-performance inference systems using vLLM, ONNX, and Triton, significantly improving latency and reducing cloud costs
● Applied advanced prompt engineering techniques (CoT, few-shot, RLAIF) to improve reasoning and QA performance by 18% across benchmarks
● Developed full-stack AI features using React (Next.js), TypeScript, and Node.js, enabling real-time agent interactions and dynamic AI-driven UI
● Designed scalable training and evaluation pipelines using Airflow and Databricks, supporting rapid experimentation and model benchmarking
● Implemented trust and safety frameworks, including moderation APIs and citation validation, ensuring compliant and reliable AI outputs
2024 — 2025
2024 — 2025
San Francisco, California
Focused on LLM systems, backend infrastructure, and scalable AI services for enterprise and internal platforms.
● Built scalable backend systems and microservices across hybrid cloud environments, improving system uptime by 40% and reducing issue resolution time by 30%
● Fine-tuned LLaMA models (8B & 70B) on domain-specific datasets, improving summarization and Q&A accuracy by 27% for enterprise users
● Developed RAG-based AI systems using LangChain, LlamaIndex, and Weaviate, reducing support ticket resolution time by 40%
● Built and deployed LLM-powered APIs and chat systems using FastAPI and Next.js, supporting 10K+ daily user interactions
● Optimized model inference using ONNX, quantization, and distributed GPU systems, achieving sub-200ms latency in production
● Implemented MLOps pipelines using MLflow, Prometheus, and AWS, enabling scalable model tracking, monitoring, and deployment
● Designed internal dashboards and developer tools using PostgreSQL, GraphQL, and REST APIs to monitor model performance and usage
● Ensured AI safety and compliance by implementing guardrails, PII filtering, and red-teaming workflows aligned with Responsible AI standards
2020 — 2023
2020 — 2023
India
Worked on ML models and AI systems in fintech and insurance domains, focusing on analytics, APIs, and deployment.
● Developed machine learning models for credit risk scoring using Python and scikit-learn, improving loan default prediction accuracy by ~18%
● Built AI-powered insurance solutions using TensorFlow and OpenCV, enabling automated health risk profiling and premium calculation
● Applied clustering techniques (KMeans, DBSCAN) to analyze customer behavior, increasing user engagement by 30% through personalization
● Developed RESTful APIs using FastAPI and Flask, deploying scalable services on AWS and Azure cloud platforms
● Contributed to full-stack development using React and Python backends, building dashboards and workflows for fintech and insurance clients
● Supported MLOps pipelines using MLflow, Airflow, and DVC, automating model versioning, tracking, and retraining processes
● Built data preprocessing and feature engineering pipelines to improve model performance and reliability across multiple use cases
● Delivered data visualization dashboards using Power BI and Streamlit, enabling business stakeholders to derive actionable insights
Education
San José State University
Master's Degree
Osmania University
Bachelor's Degree
All Saint's High School