
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Anthropic Fine Tuned a Language Model
The audomated metric is an interesting idea, but i am most excited about getting a much better understanding of the situationl awareness of current models through rigorous human evaluation. The project also might produce compelling examples of attempts at misallined powe seeking in l elems that could be very useful for field building, convincing m l researches. If this worked, it would be hugely valuable. Even if it doesn't get people to slow down, it might help inform alinment research about likely failure modes for l elams. Anthropic l l m alignment, anthropic fine tuned a language model to be more helpful, honest and harmless. H h h, motivation, i think the
Play episode from 19:34
Transcript


