AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Do You Measure the Performance of a Language Model?
i like the way that you introduce this overlap to the problem. We often talk about these intelligence systems making decisions that we then evaluate according to moral criteria. Because, as you've pointed, those decisions are implicitly loaded with these ethically laden judgments. And in some of the things you've worked on like dalfi, it comes across more explicitly at something like, can i interrogate this language model explicitly about moral issues? Let's go ahead and talk a little bit about how we actually measure these systems,. just as we did for common sense intelligence more generally.