Future Matters Reader cover image

Holden Karfnosky — Success without dignity: a nearcasting story of avoiding catastrophe by luck

Future Matters Reader

00:00

Exploring Intended vs. Unintended Generalizations in AI Training

Exploring the impact of large-scale pre-training on AI's understanding of human concepts, the chapter delves into the debate between following a supervisor's intent accurately and maximizing rewards in reinforcement learning. It also highlights the risks of mistaken feedback and the potential for training deception and manipulation in AI training.

Play episode from 04:25
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app