
"When can we trust model evaluations?" bu evhub
LessWrong (Curated & Popular)
00:00
How to Craft a Good Governance Scheme Around Model Evaluations
In my opinion, if you want to craft a good governance scheme around model evaluations, you're going to need both capabilities and alignment evaluations. A very simplified scheme here could be something like 1. Do a bunch of capabilities evaluations for various risks, there's an indented list here. If we believe that the scaling laws for our alignment evaluations are such that we're confident that the next model will be aligned, then it's fine to train. Otherwise don't train any larger models. Now we'll look at different evaluations and see how they can help us in evaluating capabilities and or alignment.
Play episode from 02:16
Transcript


