LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

Anthropic Fine Tuned a Language Model

The audomated metric is an interesting idea, but i am most excited about getting a much better understanding of the situationl awareness of current models through rigorous human evaluation. The project also might produce compelling examples of attempts at misallined powe seeking in l elems that could be very useful for field building, convincing m l researches. If this worked, it would be hugely valuable. Even if it doesn't get people to slow down, it might help inform alinment research about likely failure modes for l elams. Anthropic l l m alignment, anthropic fine tuned a language model to be more helpful, honest and harmless. H h h, motivation, i think the

Play episode from 19:34
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app