(Backend / AI Systems)
ā¢Developed a multimodal RAG system for domain-specific, low-resource language translation using
semantic chunking, derivative generation, question-answer pairing, summarization, and CLIP-based
captioning of images and tables ā enabling text and media co-retrieval across 1,000 manually
collected federal and state documents.
ā¢Increased top-1 accuracy from 73% to 97% (top-2 to 99.7%) on 30K+ generated queries across 10+
subject domains by optimizing model hyperparameters and diversifying training data ā significantly
improving user trust and accuracy in compliance-related document search.
ā¢Improved mean reciprocal rank (MRR) from 0.67 to 0.85 by designing a custom reranking mechanism,
significantly increasing retrieval accuracy and reliability for legal and regulatory translation tasks.
ā¢Fine-tuned custom and pre-trained models for image classification using transformers and CNN neural architectures using appropriate data preparation, optimization, and telemetry techniques.
ā¢Built classification and filtering components of a guardrails workflow - filtering out 99% of irrelevant,
nonsensical, and toxic input to reduce unnecessary server bandwidth and LLM costs.