AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Can Language Models Tell Us When They Don't Know What They Know?
Can these types of agents tell us when they don't know something or is that a hard problem? I'd say sort of if you ask a question that's kind of in the core of, uh, the models knowledge, it will know. The training objective strongly incentivizes the model to be calibrated - meaning it has a reasonable estimate of it. It doesn't feel like a insurmountable problem, but there's some practical difficulties to gettingThere.