AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Relationship Between Disentanglement and Policy

There are those those pointers that exist, although not as like not discussed in the optimal way. There have been other works on lateral inhibition and different activation functions from the disentanglement literature. And I think things are kind of a bit richer and a bit more well flushed out on the other side of the divide between the AI safety, interpretability community, and the more mainstream ML community. So yeah, the Softmax Linear Unit paper was cool. But as we continue with work like this, I think it'll be really useful to take advantage of the wealth of understanding that we have from a lot of work in the 2010s on disentanglements.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app