Are You a Superintelligent AI?

i'd love to know from you if you agree with this characterization. But it seems to me then, thatt this isa a sort of, there's a two-part argument. What about reward functions specified by humans with the intention of being specifically good? Ah, then that kind of fails due to all the standard allignment arguments round goodhearts law and it being impossible to specify metrics that aren't hackable. And then what your paper does as it takes that and says, yes, and furthermore, the one of the specific failure modes that we can anticipate a leading to the disaster is related to power seeking. In other words, like the failure will in part manifest through a

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app