
Adrià Garriga-Alonso
Machine learning researcher at FAR.AI, focusing on mechanistic interpretability and detecting AI scheming.
Best podcasts with Adrià Garriga-Alonso
Ranked by the Snipd community

Jan 20, 2025 • 28min
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.