Hard Fork

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

May 31, 2024

01:19:20

Snipd AI

Josh Batson, a researcher at the A.I. startup Anthropic, discusses how an experiment with chatbot Claude and the Golden Gate Bridge represents a breakthrough in understanding large language models. The podcast also covers recent developments in A.I. safety, including Google's A.I. controversies and OpenAI's new voice assistant being revoked for safety reasons.

AI Summary

Highlights

AI Chapters

Episode notes

Podcast summary created with Snipd AI

Quick takeaways

Understanding how large language models organize concepts through dictionary learning is crucial for interpreting AI behavior.

Large language models demonstrate strong analogical abilities by representing nuanced conceptual associations, showcasing a deep understanding of multifaceted ideas.

Advancements in AI safety, such as Google DeepMind's Frontier Safety Framework, highlight the growing importance of ensuring safe AI development.

Deep dives

Understanding the Inner Workings of Large Language Models

By developing a method called dictionary learning, researchers unlocked patterns in large language models, revealing how the models organize concepts like entities, styles of poetry, and responses to questions. These patterns helped identify features corresponding to real-world concepts, showcasing an improved understanding of how the models think.

Discovering the Model's Language

01:25

The Intriguing Behavior of Golden Gate Claude

03:54

Imperfect AI Overviews and Google's Response

03:44

The Evolution of Google's AI Launches and Concerns About AI Overviews

03:55

Introduction

4min

Google AI Controversies and Search Algorithms

27min

Breakthrough in AI Interpretability with Large Language Models

6min

Uncovering Patterns in Language Models

6min

Claude: The AI that Thinks It's the Golden Gate Bridge

18min

Exploring AI Personas and Recent AI Safety Developments

3min

AI Safety Concerns and Leadership Issues at Open AI

13min

Anticipation of Advancements in AI and Preparation for AI Safety

3min

This week, Google found itself in more turmoil, this time over its new AI Overviews feature and a trove of leaked internal documents. Then Josh Batson, a researcher at the A.I. startup Anthropic, joins us to explain how an experiment that made the chatbot Claude obsessed with the Golden Gate Bridge represents a major breakthrough in understanding how large language models work. And finally, we take a look at recent developments in A.I. safety, after Casey’s early access to OpenAI’s new souped-up voice assistant was taken away for safety reasons.

Guests:

Josh Batson, research scientist at Anthropic

Additional Reading:

We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Hard Fork

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Understanding the Inner Workings of Large Language Models

Unveiling Conceptual Associations in Models

Intriguing Discoveries and Engaging Experiments

Implications and Future Directions in AI Research

OpenAI's Concerns with Model Feedback and Sycophantic Responses

Developments in AI Safety Across Various Organizations

Discovering the Model's Language

The Intriguing Behavior of Golden Gate Claude

Imperfect AI Overviews and Google's Response

The Evolution of Google's AI Launches and Concerns About AI Overviews

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights

Hard Fork

Google Eats Rocks + A Win for A.I. Interpretability + Safety Vibe Check

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Understanding the Inner Workings of Large Language Models

Unveiling Conceptual Associations in Models

Intriguing Discoveries and Engaging Experiments

Implications and Future Directions in AI Research

OpenAI's Concerns with Model Feedback and Sycophantic Responses

Developments in AI Safety Across Various Organizations

Discovering the Model's Language

The Intriguing Behavior of Golden Gate Claude

Imperfect AI Overviews and Google's Response

The Evolution of Google's AI Launches and Concerns About AI Overviews

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights