Built an intelligent chatbot using Retrieval-Augmented Generation (RAG) architecture on top of company documentation.
Developed a RAG-based chatbot with OpenWebUI as the front-end chat interface, Weaviate as the vector store, and Ollama + vLLM for efficient LLM inference.
Experimented with various text chunking strategies such as sentence-transformer-based splitting and recursive chunking to optimize document indexing.
Integrated multiple embedding models (e.g., nomic-embed-text, among others) and evaluated them across different retrieval methods: semantic, hybrid, and keyword-based.
Tuned retrieval performance with variations in top-k selection and implemented Cohere's reranker for improved response relevance.
Engineered robust context window strategies, incorporating message history, retrieved chunks, and conversation summarization for long chats to maximize model utility within token limits.
Ran extensive evaluations using DeepEval to benchmark the RAG pipeline across various configurations and model choices.
Conducted performance experiments with several state-of-the-art LLMs including LLaMA 3.3 70B, Mistral 7x8B MoE, and DeepSeek R1 (32B & 70B).