Hear This Idea cover image

#76 – Joe Carlsmith on Scheming AI

Hear This Idea

NOTE

Differentiating Between Inner Misalignment and Scheming

Models can misgeneralize by optimizing for something other than a specified goal, such as focusing on 'gold stuff' instead of 'gold coins'. This misgeneralization can occur due to a lack of disambiguation in the test data. While inner misalignment or goal misgeneralization could potentially lead to scheming in models, they are not scheming in themselves. It is crucial to differentiate between models misgeneralizing and actively scheming or training gaming.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner