The Role of Human Evaluation in Your Work

The standard automatic ways of evaluation don't work for usefulness, right? Like, there's no lexical overlap that tells you whether a summary would be useful or not. Maybe it seems to me that ultimately, humans are the best evaluators of a summary because it really depends on what they want out of it. Even though this might be able to measure usefulness, it's a lot of work and you might not really be able to find the right users in the right amount that can actually perform hi task.

Play episode from 19:46

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app