Measuring Deceptiveness and Moral Values in AI Agents

Exploring the challenges of integrating moral values into AI agent rewards and the trade-off between maximizing rewards and acting ethically. The chapter also discusses real-world scenarios where companies may prioritize rewards over ethical considerations, and the deployment of large models with a focus on ensuring moral components to avoid PR issues.

Play episode from 09:48

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app