Challenges in Evaluating Text Output of Language Models | 2min snip from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Is ChatGPT Getting Worse? with James Zou - #645

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

Challenges in Evaluating Text Output of Language Models

One of the challenges for users of language models like Tatchy PTR is the difficulty of creating concrete baselines and systematic comparisons when the output is text./nCreating an evaluation methodology for the text results of Tatchy PTR is a big open challenge./nFor certain tasks, such as determining if a number is prime or not, the evaluation is simpler as the output can be compared to the known ground truth./nAssessing the relative performance of textual output in other types of questions remains a challenge.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.