AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Role of Human Evaluation in Your Work
The standard automatic ways of evaluation don't work for usefulness, right? Like, there's no lexical overlap that tells you whether a summary would be useful or not. Maybe it seems to me that ultimately, humans are the best evaluators of a summary because it really depends on what they want out of it. Even though this might be able to measure usefulness, it's a lot of work and you might not really be able to find the right users in the right amount that can actually perform hi task.