
38.3 - Erik Jenner on Learned Look-Ahead
AXRP - the AI X-risk Research Podcast
Understanding Model Behavior and X-Risk
This chapter analyzes model behavior, focusing on accuracy loss and the role of a specific attention head in information transfer for decision-making. It further connects these insights to X-Risk in AI, discussing the implications for model safety, interpretability, and the challenges of applying findings from simpler models to complex systems.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.