The Cost of Human Evaluation in Text Generation

Most research that's done in text generation usually also has human evaluation. It almost seems like that's the gold standard for evaluation. Is it because the idea is that the end consumers of text generation systems are usually humans? I think so. In intrinsic evaluation, we ask people to evaluate the quality of the generated text,. For instance, machine translation might be one example. This way, if there are like n different type of criteria, we'll probably haven different type of intrinsic evaluations which is very rich. And then you can go back and evaluate how your models are doing with these experiments.

Play episode from 10:08

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app