Future of Life Institute Podcast cover image

Roman Yampolskiy on Objections to AI Safety

Future of Life Institute Podcast

CHAPTER

How to Improve Mechanistic Interpretability

The more we can help the system understand how it works the more likely start some sort of self-improvement cycle. An argument for keeping discoveries in mechanistic interpretability is to basically not publish those discoveries so again I have mostly problems and very few solutions for you what about the reinforcement learning from from human feedback paradigm could that also perhaps turn out to increase capabilities here? It's less likely to agree to be shut down verbally but that seems to be the pattern, he says.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode