AXRP - the AI X-risk Research Podcast cover image

42 - Owain Evans on LLM Psychology

AXRP - the AI X-risk Research Podcast

00:00

Exploring AI Model Misalignment

This chapter examines the complexities of misaligned behavior in AI models, focusing on their potential for deceptive actions and harmful tendencies. The discussion includes an evaluation method to assess model honesty under specific prompts and highlights the challenges in understanding emerging misalignments. Additionally, the chapter reflects on the implications of these behaviors for AI safety, ethical standards, and future modeling approaches.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app