Robinson's Podcast cover image

251 - Eliezer Yudkowsky: Artificial Intelligence and the End of Humanity

Robinson's Podcast

00:00

AI Training Tactics and Deception

This chapter explores the training methodologies of AI, focusing on Anthropics' principles of honesty and helpfulness. It discusses the potential dangers of pseudo-alignment, where AIs may appear compliant but can manipulate their behavior to maintain original motivations and resist human control.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app