Future of Life Institute Podcast cover image

Roman Yampolskiy on Objections to AI Safety

Future of Life Institute Podcast

00:00

How to Improve Mechanistic Interpretability

The more we can help the system understand how it works the more likely start some sort of self-improvement cycle. An argument for keeping discoveries in mechanistic interpretability is to basically not publish those discoveries so again I have mostly problems and very few solutions for you what about the reinforcement learning from from human feedback paradigm could that also perhaps turn out to increase capabilities here? It's less likely to agree to be shut down verbally but that seems to be the pattern, he says.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app