3min snip

80,000 Hours Podcast cover image

#141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well

80,000 Hours Podcast

NOTE

Are There Any Technical Approaches to AIs Critiquing Each Other's Behavior?

The governance team at OpenAI is considering the possibility of a world where risky systems are not built and seeks technical approaches to address the misbehavior of AIs. One approach called 'debate' involves training AIs to critique each other's behavior, aiming to automate the process of giving high-level feedback. The potential scalability and human understandability of AIs making arguments about each other's behavior are acknowledged concerns. Additionally, there is excitement about progress in interpretability, as the current regime of rewarding agents for good behavior without understanding internal mechanisms is considered suboptimal. Efforts are being made to gain a systematic scientific understanding of how giving rewards affects neural networks' mechanisms and representations.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode