

Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy)
16 snips Dec 14, 2023
In this episode, Soroush Pour interviews Thomas Larsen, Director for Strategy at the Center for AI Policy. They discuss various topics including technical alignment areas such as scalable oversight, interpretability, heuristic arguments, model evaluations, agent foundations, and more. They also explore the concept of AIXI, uncomputability, building a multi-level world model, inverse reinforcement learning, and cooperative AI. The conversation concludes with a discussion on future challenges and cooperation in AI systems.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12
Introduction
00:00 • 2min
Thomas Larsen's Background and Work in AI Safety
02:02 • 9min
Interpreting AI and Model Evaluation
11:26 • 19min
Interpretability in Language Models and Machine Learning
30:22 • 10min
Exploring Interpretability and the Colin Burns Activation Probes Technique
40:45 • 2min
Critiques and Heuristic Arguments in AI Alignment Research
42:55 • 14min
Exploring AIXI and Hypercomputers in AGI
57:13 • 2min
Uncomputability and the Halting Problem
58:54 • 9min
Building a Multi-Level World Model
01:07:46 • 11min
Building AI Systems, Inverse Reinforcement Learning, and Cooperative AI
01:18:48 • 2min
Future Challenges and Cooperation in AI Systems
01:21:01 • 14min
Discussion on AI research funding and recommendations
01:35:20 • 2min