
42 - Owain Evans on LLM Psychology
AXRP - the AI X-risk Research Podcast
00:00
Intro
This chapter delves into the research on how language models can introspect to reveal their internal states, including goals and desires. The discussion highlights the implications of self-understanding in AI, particularly regarding honesty and potential unintended outcomes.
Transcript
Play full episode