AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Evaluate a Debate?
When we have violated in individual components, we use the standard palatime of superficial learning. We need to be very rigorous in terms of benchmak data development and things like that. But when you talk about evaluating the system as a whole, this is a different story. It's not that clear how to called the system. Its debates can last for, in our format, can last for 25 minutes. The average score of human debaters was around four. And I think the system came close to expert level on some topics.