AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Challenges in Alignment Evaluation and Evaluating Agency in AI Systems
This chapter explores the challenges and approaches to evaluating alignment in AI systems, including the need for assurance of appropriate behavior across different scenarios, the importance of broad coverage in evaluation, and the use of mechanistic analysis to detect deceptive behavior. It also discusses the evaluation of agency in AI systems and the potential unintended goal-directed behavior of models.