AI's Alignment

I think it's hard to evaluate fully a model if you're not an expert on the domain. One of the things that we can do is what they did for instructor PPT, where they had like a bunch of laborers and people giving evaluations for how much you prefer one output compared to the others. And then you can roughly build a reward model that can say if an output is good or not. I'm talking in really long term. So it's probably well after AGI. At the point, my guess is that something like we have to argument our own intellectual capacity is something new on it or something. We become the AI itself.

Play episode from 01:03:22

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app