LessWrong (Curated & Popular) cover image

"When can we trust model evaluations?" bu evhub

LessWrong (Curated & Popular)

00:00

How to Craft a Good Governance Scheme Around Model Evaluations

In my opinion, if you want to craft a good governance scheme around model evaluations, you're going to need both capabilities and alignment evaluations. A very simplified scheme here could be something like 1. Do a bunch of capabilities evaluations for various risks, there's an indented list here. If we believe that the scaling laws for our alignment evaluations are such that we're confident that the next model will be aligned, then it's fine to train. Otherwise don't train any larger models. Now we'll look at different evaluations and see how they can help us in evaluating capabilities and or alignment.

Play episode from 02:16
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app