
ELK And The Problem Of Truthful AI
Astral Codex Ten Podcast
00:00
The Human Simulator Is a Better Predictor Than a Human One
The authors write, that means the human simulator works well with many more possible predictors. They conclude this ultimately won't work. An a i could memorize the innards that it's supposed to be attached to and then spout gibberish if given any variations. The problem of devising a training strategy to get an a i to report what it knows is "one of the most exciting open problems in alignment theory," they say.
Transcript
Play full episode