Exploring Robust Forms of Practical P.S. Alignment and Controlling Capabilities

This chapter delves into the concept of robust practical P.S. alignment in AI systems, discussing the search for problematic objectives, reading objectives from internal states, and the potential of short-term objectives in mitigating power-seeking behaviors. It also examines the challenges and difficulties in ensuring myopia with certain search techniques and long-term training processes.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app