
What is AI Alignment?
AI Safety Fundamentals
00:00
Exploring Failures and Reward Functions in AI Alignment
The chapter delves into the ambiguity of failures in AI alignment, pondering the importance of outer alignment in reinforcement learning systems. Shard theory is introduced to analyze the connection between reward functions and learned behaviors, proposing the establishment of reward functions based on imparted values.
Transcript
Play full episode