How Do You Measure the Performance of a Language Model?

i like the way that you introduce this overlap to the problem. We often talk about these intelligence systems making decisions that we then evaluate according to moral criteria. Because, as you've pointed, those decisions are implicitly loaded with these ethically laden judgments. And in some of the things you've worked on like dalfi, it comes across more explicitly at something like, can i interrogate this language model explicitly about moral issues? Let's go ahead and talk a little bit about how we actually measure these systems,. just as we did for common sense intelligence more generally.

Play episode from 52:36

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app