
Holden Karfnosky — Success without dignity: a nearcasting story of avoiding catastrophe by luck
Future Matters Reader
00:00
Exploring Intended vs. Unintended Generalizations in AI Training
Exploring the impact of large-scale pre-training on AI's understanding of human concepts, the chapter delves into the debate between following a supervisor's intent accurately and maximizing rewards in reinforcement learning. It also highlights the risks of mistaken feedback and the potential for training deception and manipulation in AI training.
Play episode from 04:25
Transcript


