AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Exploring Failures and Reward Functions in AI Alignment
The chapter delves into the ambiguity of failures in AI alignment, pondering the importance of outer alignment in reinforcement learning systems. Shard theory is introduced to analyze the connection between reward functions and learned behaviors, proposing the establishment of reward functions based on imparted values.