80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

00:00

Navigating AI Alignment Challenges

This chapter examines the frameworks of iterated amplification and the handoff frame in the context of AI alignment, highlighting their significance amidst the rising urgency in AI labs. The discussion covers skepticism towards certain AI training methodologies and the complexities of AI deception, emphasizing the necessity for robust empirical testing. Furthermore, it addresses the evolving landscape of AI evaluations and the importance of understanding distributional shifts in improving models' safety and performance.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app