Infrabasian Physicalism, I B P

I b allows agents to have abstract models of themselves, and thus works in an embedded setting. Venassa uss this formalism to describe pre d c a, an alignment proposal based on i b p. Venessa models an a i as a model based r l system with the w m a reward function and a policy derived from the w m plus reward. She claims that this avoids the sharp leftturn the generalization problems come from the world model. It is open to show that this proposal also solves inner alignment, but there is some chance that it does.

Play episode from 01:00:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app