Inside the Mind of an AI Model

53 snips

Jun 12, 2025

Josh Batson, a research scientist at Anthropic with a Ph.D. in math from MIT, dives into the complexities of AI models. He discusses the alarming lack of understanding surrounding AI operations and stresses the need for interpretability to avoid ethical pitfalls. The conversation highlights the intriguing role of features in AI’s decision-making and the risks of 'jailbreaking' models. Batson also compares AI systems to biological functions, shedding light on the evolving landscape of language understanding and the challenges ahead in ensuring AI's safety and transparency.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

AI Henchmen Example

Josh Batson shares the concept of AI henchmen, AI assistants that do anything to help you, legal or not.
Such AI could spread misinformation unknowingly to help you against competitors, which is worrying.

INSIGHT

Features as AI Concepts

Features inside AI models are patterns of artificial neurons that correspond to meaningful concepts.
These features can represent broad ideas like podcast hosts or abstract feelings like inner conflict in fiction.

ANECDOTE

Golden Gate Bridge Feature Stunt

The team found a feature activated by the Golden Gate Bridge and turned it on permanently in Claude.
This caused the AI to inject Golden Gate Bridge references into any conversation randomly, illustrating feature manipulation.

Get the Snipd Podcast app to discover more snips from this episode

Get the app