
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Applied Alignment
Redwood is also doing some work on interpret ability tools, though as far as i know, they have not published a wriht up of their interpre ability results. As of april, they were focused on getting a complet understanding of non trivial behaviors of relatively small models. They have released a web siht for visualizing transformers. Apart from the standard benefits of interpretability, one possibility is that this might be helpful for solving e l k,. opinion, excited to see the results from this.
Play episode from 01:14:35
Transcript


