Alignment Theory and Agentic AI

Alignment theory has been largely pushed by considerations of agentic AGIs. Agents are convergently dangerous for theoretical reasons like instrumental convergence, good heart and orthogonality. RL creates agents and RL seem to be the way to AGI. It's a reasonable guess that humans will create such systems if only because we can. What pops out of self supervised predictive training is noticeably not a classical agent.

Play episode from 17:33

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app