Led a cross-functional of 6 engineers to develop a RoBERTa-based machine learning pipeline using Pytorch and Hugging Face Transformers, enhancing Cody, an AI assistant, and achieving a 10% increase in response accuracy.
•Developed a comprehensive dataset by collecting documentation and issue pages from over 10,000 open-source repositories.
•Utilized web scraping tools such as Beautiful Soup and Scrapy to gather and rank repository data relative to more than 150 queries with known ground truths.
•Preprocessed and tokenized text data using RobertaTokenizer to prepare it for model training.
•Trained and Optimized the RoBERTa model to rank and return the top k documentation/issues combinations that most accurately answer our queries.
•Evaluated model performance by comparing outputs with ground truths, measuring response improvements using different context combinations.
•Analyzed data to identify the most useful context sources, enhancing the model's ability to select relevant information and improving overall response accuracy.
•Communicated complex technical concepts to technical and non-technical stakeholders, facilitating better decision-making across departments.