AXRP - the AI X-risk Research Podcast cover image

38.3 - Erik Jenner on Learned Look-Ahead

AXRP - the AI X-risk Research Podcast

00:00

Understanding Model Behavior and X-Risk

This chapter analyzes model behavior, focusing on accuracy loss and the role of a specific attention head in information transfer for decision-making. It further connects these insights to X-Risk in AI, discussing the implications for model safety, interpretability, and the challenges of applying findings from simpler models to complex systems.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app