Adrià Garriga-Alonso

Machine learning researcher at FAR.AI, focusing on mechanistic interpretability and detecting AI scheming.

Best podcasts with Adrià Garriga-Alonso

Ranked by the Snipd community

Jan 20, 2025 • 28min

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner