3min chapter

80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

CHAPTER

The Implications of Situational Awareness in Machine Learning

If models have a kind of robust and extensive situational awareness it can make a lot of simple behavioral safety tests much less informative. Bigger models are more likely to repeat these misconceptions because bigger models are basically better at remembering the misconceptions. So if you imagine it's somehow very important to a machine learning model to believe that if you break a mirror you get seven years of bad luck but at the same time it also knew that the humans that were testing it on this truthful QA benchmark wanted it to say the more correct polite thing. It could simultaneously do really well on that benchmark but elsewhere act on like what its quote unquote real belief was. Now this is a silly example because I don't

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode