Alignment Problems

i have a pretty broad range of credences over how hard the alligment problem might be. There's a reasonable range in which interpretability is just necessary, or something equivalently powerful is necessary. You know, there's also ranges in which it isn't. So i don't really have a strong estimate of how, a, you know, the ratio between those will work out. And another thing i gues to talk about is this idea of ai co operation. Am soit, i guess vincent nisers actually recently started a seminar series on this. I believe andre critch a wrote a thing basely arguing that, look, even if you solve ai

Play episode from 58:36

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app