
120 - Evaluation of Text Generation, with Asli Celikyilmaz
NLP Highlights
00:00
The Disadvantages of Inter-Annotator Agreement
Human evaluation is the gold standard for evaluating texture and resistance. But it seems to me that a big disadvantage is when humans don't agree with each other. As opposed to automated metrics, human judgments are subject to how to be dealt with the subject to the task. Still, a little bit of an expertise goes a long way in this.
Transcript
Play full episode