Reward Modeling for Prosaic Alignment

I think we still have a lot of work to do in terms of educating people in academia about like the safety concerns and, um, you know, winning hearts and minds is how I put it. And that's especially true for machine learning within the existential safety community and vice versa. So I'm pretty pessimistic because I don't think the technical approaches can be solved by these kinds of systems. We're going to need some ability to coordinate and say, "Let's not pursue this path or let's not deploy these types of systems" That kind of system seems really high and doesn't seem safe at all.

Play episode from 01:23:00

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app