Anthropic Fine Tuned a Language Model

The audomated metric is an interesting idea, but i am most excited about getting a much better understanding of the situationl awareness of current models through rigorous human evaluation. The project also might produce compelling examples of attempts at misallined powe seeking in l elems that could be very useful for field building, convincing m l researches. If this worked, it would be hugely valuable. Even if it doesn't get people to slow down, it might help inform alinment research about likely failure modes for l elams. Anthropic l l m alignment, anthropic fine tuned a language model to be more helpful, honest and harmless. H h h, motivation, i think the

Play episode from 19:34

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app