2min chapter

80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

CHAPTER

How to Train an AI System to Explain Why Proteins Fold a Certain Way

In theory, could we train a model that would explain why proteins are folded a particular way or explain why a particular Go move is good? I think we could totally try to do that. But it's much harder and less obvious how to train this system to have the words it's saying be like truly connected to why it's making the moves it's making. Even when you kind of try and improve this training procedure, it's not totally clear if we can actually get the system to say everything that it knows aboutWhy it's making this move. Yeah. So in the CEO business model case where you're trying to train a model to take good actions to make money, would the same

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode