LessWrong (Curated & Popular) cover image

"When can we trust model evaluations?" bu evhub

LessWrong (Curated & Popular)

00:00

How to Make Behavioral IID Fine-Tuning Evaluations Trustworthy

Behavioral IID fine-tuning evaluations are one which is testing the model's behavior on some particular distribution. Involves first fine- Tuning the model on the exact same task that you want to evaluate it on. Uses data that is independently and identically distributed IID relative to the evaluation data. No gradient hacking, so we're not doing any exploration in this case.

Play episode from 11:35
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app