Experience
2024 — Now
2024 — Now
● AI Chatbot for User Access Analytics
• Designed and developed an intelligent SQL chatbot leveraging Python, LangChain, and LangGraph, powered by ChatGPT-4.0, to create AI flows that allow users to interact with a database of user access data.
• Engineered and deployed the chatbot application in a Dockerized environment, ensuring robust scalability, portability, and production-level reliability.
• Implemented a Retrieval-Augmented Generation (RAG) approach with ChromaDB to enhance response accuracy and relevance by efficiently retrieving relevant examples
• Utilized Python and Boto3 with AWS Athena for scalable database queries and S3 for storing results and outputs.
• Developed a REST API using Python Flask to enable seamless communication between the chatbot backend and a Node.js-based frontend, ensuring efficient data exchange and a smooth user experience.
2023 — 2024
2023 — 2024
Santa Clara County, California, United States
● Spares Forecasting Machine Learning Project
• Authored SQL queries in Hadoop to extract, transform data from multiple different data sources to create a unified dataset.
• Performed data cleaning and exploratory data analysis using Pandas and Matplotlib, identifying errors, anomalies, and trends in the dataset.
• Applied mean imputation to handle missing values, target encoding for categorical variables, and information gain for feature selection, resulting in a more refined dataset for modeling
• Experimented with multiple models and identified LightGBM as the best performer through cross-validation reducing forecast error (MAPE) by 80% compared to baseline models.
• Deployed the LightGBM model in production, enabling risk assessment and sales forecasting, with results visualized through Tableau and Power BI dashboards to support business decisions.
2023 — 2024
Santa Cruz, California, United States
• Conducted an in-depth analysis of Multimodal Large Language Models (MLLMs) in Visual Question Answering (VQA) tasks across both synthetic and real-world datasets.
• Compared four distinct MLLM methods: ChatGPT4v, Gemini Pro Vision, LLAVA 1.6, and PICA, using the OKVQA, GQA, VQA v2, and AGVQA datasets.
• Sorted VQA questions into perception and cognition tasks, rigorously testing model skills in object recognition and complex reasoning.
• Sorted VQA questions into perception and cognition tasks, rigorously testing model skills in object recognition and complex reasoning.
• Outlined strengths and limitations of current MLLM methodologies, providing actionable insights to enhance model training and scalability in production environments.
2022 — 2024
• Facilitated student understanding of key Probability and Statistics concepts, including Bayes' Theorem and statistical inference, by demonstrating practical applications in experimental design and the law of large number
• Instructed on essential math concepts for machine learning—probability, linear algebra, and optimization—through practical examples and algorithm demonstrations to enhance model evaluation techniques.
• Guided students in mastering Python programming fundamentals, covering data types, control flow, methods, and Object-Oriented Programming (OOP) to build a strong foundation for advanced ML implementations
2022 — 2023
Santa Cruz, California, USA
• Led the development of a specialized Visual Question Answering (VQA) dataset tailored for the agricultural domain, enhancing the accuracy of machine learning models in answering domain-specific queries
• Deployed advanced web scraping techniques using BS4 and Python to extract relevant forum data from Agtalk, enriching the dataset’s comprehensiveness and relevance.
• Applied Natural Language Processing (NLP) methods, including regex and fine-tuning a BERT model from Huggingface, to accurately discern and categorize pertinent questions-
Education
University of California, Santa Cruz
Masters
University of California, Santa Cruz