AXRP - the AI X-risk Research Podcast

23 - Mechanistic Anomaly Detection with Mark Xu

Jul 27, 2023
Ask episode
Chapters
Transcript
Episode notes
1
Introduction
00:00 • 3min
2
The Problem With Mechanistic Anomalies
02:51 • 3min
3
The Importance of Rationality in AI
05:33 • 2min
4
The Future of AI Alignment
07:08 • 2min
5
How to Rule in Bad Behavior
09:29 • 5min
6
The in Distribution and Out of Distribution Anomalies
14:25 • 2min
7
The Importance of Mechanistic Anomalies in Training
16:43 • 2min
8
The Problem With Mechanistic Anomalies in AI
18:13 • 4min
9
The Importance of Rationality in AI
22:00 • 2min
10
The Importance of Mechanistic Anomaly Detection
23:59 • 2min
11
How to Analogize AI's Actions
26:05 • 2min
12
The Importance of Mechanistic Anomaly Detection
27:36 • 3min
13
The Mechanistic Anomaly of Diamonds
30:33 • 2min
14
The Importance of Predicting the Future
32:19 • 5min
15
The Diamond: A Metaphor for AI's Ability to Take Action
37:17 • 2min
16
The Problem With Mechanistic Anomaly Detection
38:58 • 3min
17
How to Drop Out Mechanisms in AI Training
42:13 • 4min
18
The Mechanism That Causes Both Sensors to Be On
45:48 • 2min
19
The Different Mechanisms That Explain Noise and Sensor One
47:59 • 4min
20
The Mechanisms of Manipulation of Noise
52:01 • 2min
21
How to Know if Your AI Wants to Make Things Happen
53:52 • 3min
22
How to Predict a Behavior
57:17 • 2min
23
AI's Potential to Improve Human Sensitivity
59:15 • 4min
24
How to Improve Sensor Readings With AI
01:02:53 • 2min
25
The Limits of Mechanistic Anomaly Detection
01:04:59 • 2min
26
Distribution vs. Out of Distribution Anomaly Detection
01:07:25 • 3min
27
The Importance of Interpretability in Mechanistic Anomalies
01:10:07 • 3min
28
Formalizing the Presumption of Independence
01:12:38 • 5min
29
Heuristic Arguments for Mechanistic Anomalies
01:17:47 • 2min
30
The Energy Argument in Physics
01:20:02 • 3min
31
The Role of Heuristics in Acoustical Arguments
01:22:58 • 2min
32
The Presumption of Independence in Stiff Simulations
01:24:52 • 2min
33
Heuristic Arguments Give Quote Unquote the Wrong Answer
01:27:17 • 4min
34
The Heuristic Argument for the N Equals Three Case
01:31:00 • 2min
35
How to Maximize the Entropy of Your Probability Distribution
01:32:49 • 4min
36
The Maximum Entropy Distribution of a Circuit
01:37:16 • 2min
37
The Inevitable Property of Maximum Entropy
01:39:03 • 3min
38
The Robustness of Heuristic Estimators
01:41:49 • 3min
39
The Impossibility of Being Adversarily Robust in AI
01:44:40 • 3min
40
The Heuristic Estimation of Quantity
01:48:10 • 2min
41
How to Deal With Adversarial Robustness in the Search Process
01:50:35 • 2min
42
Heuristic Estimates for Deficient Quantities
01:53:02 • 2min
43
Heuristic Arguments for Neural Nets
01:54:54 • 2min
44
How to Be a Good Heuristic Estimator
01:57:22 • 3min
45
How to Formalize Heuristic Arguments to Make Them Findable
02:00:15 • 2min
46
Redwood's Experimental Work on Mechanistic Anomalies
02:01:55 • 2min
47
The Importance of Probability in Research
02:03:28 • 2min