AI Safety Fundamentals cover image

Model Evaluation for Extreme Risks

AI Safety Fundamentals

00:00

Challenges in Alignment Evaluation and Evaluating Agency in AI Systems

This chapter explores the challenges and approaches to evaluating alignment in AI systems, including the need for assurance of appropriate behavior across different scenarios, the importance of broad coverage in evaluation, and the use of mechanistic analysis to detect deceptive behavior. It also discusses the evaluation of agency in AI systems and the potential unintended goal-directed behavior of models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app