AXRP - the AI X-risk Research Podcast cover image

42 - Owain Evans on LLM Psychology

AXRP - the AI X-risk Research Podcast

00:00

Introspection in Language Models

This chapter explores the concept of introspection within language models, evaluating their ability to reflect on their outputs and capabilities beyond training data. The discussion includes comparisons between two models' responses to unconventional questions, highlighting the importance of fine-tuning for improved self-prediction. Additionally, it raises critical questions about the nature of introspection in artificial intelligence and the implications of reinforcement learning on model behavior.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app