AI Safety Fundamentals: Alignment

Can We Scale Human Feedback for Complex AI Tasks?

Mar 26, 2024
Exploring the challenges of using human feedback for training AI models, strategies for scalable oversight, techniques like task decomposition and reward modeling, Recursive Reward Modeling and Constitutional AI, using debating agents to simplify complex problems, and enhancing generalization in AI models through weaker supervisors and discussions on scalability challenges.
Ask episode
Chapters
Transcript
Episode notes