Can Language Models Tell Us When They Don't Know What They Know?

Can these types of agents tell us when they don't know something or is that a hard problem? I'd say sort of if you ask a question that's kind of in the core of, uh, the models knowledge, it will know. The training objective strongly incentivizes the model to be calibrated - meaning it has a reasonable estimate of it. It doesn't feel like a insurmountable problem, but there's some practical difficulties to gettingThere.

Play episode from 27:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app