The Inside View cover image

Alexander Pan on the MACHIAVELLI benchmark

The Inside View

00:00

Measuring Deceptiveness and Moral Values in AI Agents

Exploring the challenges of integrating moral values into AI agent rewards and the trade-off between maximizing rewards and acting ethically. The chapter also discusses real-world scenarios where companies may prioritize rewards over ethical considerations, and the deployment of large models with a focus on ensuring moral components to avoid PR issues.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app