
Is ChatGPT Getting Worse? with James Zou - #645
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Challenges in Evaluating Text Output of Language Models
One of the challenges for users of language models like Tatchy PTR is the difficulty of creating concrete baselines and systematic comparisons when the output is text./nCreating an evaluation methodology for the text results of Tatchy PTR is a big open challenge./nFor certain tasks, such as determining if a number is prime or not, the evaluation is simpler as the output can be compared to the known ground truth./nAssessing the relative performance of textual output in other types of questions remains a challenge.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.