The Human Simulator Is a Better Predictor Than a Human One

The authors write, that means the human simulator works well with many more possible predictors. They conclude this ultimately won't work. An a i could memorize the innards that it's supposed to be attached to and then spout gibberish if given any variations. The problem of devising a training strategy to get an a i to report what it knows is "one of the most exciting open problems in alignment theory," they say.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app