AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

Aug 24, 2024
In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.
02:17:24

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Interpretability in AI is essential for understanding complex decision-making processes, impacting both model safety and transparency.
  • The field of AI interpretability has made progress, yet ongoing skepticism is necessary due to the shortcomings of popular methods like saliency maps.

Deep dives

Background of AI Research

The podcast features Peter Hazi, an AI researcher specializing in natural language processing (NLP) and interpretability, who completed his PhD at UNC Chapel Hill. He discusses his early interest in NLP, which sparked during an undergraduate project involving algorithmic sonic generation in 2018. Hazi's fascination with language models has evolved over time, especially as advancements like GPT-1 and GPT-2 emerged, highlighting the significant progress made in AI capabilities. This rich background sets the stage for deeper discussions on interpretability and safety in AI research.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode