Matt Ambrogi, a member of Buildspace Nights and Weekends, is exploring ways to improve the performance of data-supported chatbots, specifically those built on top of LLMs/GPT. Ambrogi plans to tackle four types of questions that data-supported chatbots struggle with: questions that require the most recent answer from the data, subjective questions, generic/high-level questions, and questions that require aggregation of facts from the data. Before setting out to tackle these issues, Ambrogi needed to establish a way to evaluate if his bot was improving. He explores two qualitative strategies: forming an intuitive opinion and using feedback from users. He also briefly touches on programmatic evaluation, which involves asking a program a series of questions to determine what percentage of responses pass a test. However, programmatic evaluation is subjective and leaves several factors up to the engineer’s interpretation. Overall, Ambrogi’s post provides insights into the evaluation of data-supported chatbots built with LlamaIndex and GPT and can be useful for anyone working on chat-based products.

source update: How to Evaluate the Quality of LLM-based Chatbots – Towards AI


There are no comments yet.

Leave a comment