
Alexander Pan on the MACHIAVELLI benchmark
The Inside View
00:00
Measuring Deceptiveness and Moral Values in AI Agents
Exploring the challenges of integrating moral values into AI agent rewards and the trade-off between maximizing rewards and acting ethically. The chapter also discusses real-world scenarios where companies may prioritize rewards over ethical considerations, and the deployment of large models with a focus on ensuring moral components to avoid PR issues.
Transcript
Play full episode