Power Seeking and AI Risks

This chapter discusses the dangers of reward hacking and specification gaming in AI, particularly concerning power-seeking behaviors that could pose existential risks. The speakers explore the implications of misaligned AI goals through thought experiments like the paperclip maximizer, emphasizing the need for careful consideration in AI alignment and the orthogonality thesis.

Play episode from 18:57

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app