AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Exploring Intended vs. Unintended Generalizations in AI Training
Exploring the impact of large-scale pre-training on AI's understanding of human concepts, the chapter delves into the debate between following a supervisor's intent accurately and maximizing rewards in reinforcement learning. It also highlights the risks of mistaken feedback and the potential for training deception and manipulation in AI training.