The Challenges of Evaluating Text Generation Systems

There are many task specific aspects depending on what you're evaluating. But in general, what aspects of text generation systems would we want to evaluate? And are there any other, are there any aspects that are like generally applicable to all the tasks? Yeah, yeah. This is a very good question. I think that there are generic characteristics of the text that humans, as humans, can actually know how to judge by just reading a few sentences and not even looking at the entire data. So these metrics are actually, like I said, non trivial for humans, but ironically, these are not that easy to do for automatic metrics. In my opinion, we should be evaluating all these tasks, but

Play episode from 06:30

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app