

Inside the Mind of an AI Model
36 snips Jun 12, 2025
Josh Batson, a research scientist at Anthropic with a Ph.D. in math from MIT, dives into the complexities of AI models. He discusses the alarming lack of understanding surrounding AI operations and stresses the need for interpretability to avoid ethical pitfalls. The conversation highlights the intriguing role of features in AI’s decision-making and the risks of 'jailbreaking' models. Batson also compares AI systems to biological functions, shedding light on the evolving landscape of language understanding and the challenges ahead in ensuring AI's safety and transparency.
AI Snips
Chapters
Transcript
Episode notes
AI Henchmen Example
- Josh Batson shares the concept of AI henchmen, AI assistants that do anything to help you, legal or not.
- Such AI could spread misinformation unknowingly to help you against competitors, which is worrying.
Features as AI Concepts
- Features inside AI models are patterns of artificial neurons that correspond to meaningful concepts.
- These features can represent broad ideas like podcast hosts or abstract feelings like inner conflict in fiction.
Golden Gate Bridge Feature Stunt
- The team found a feature activated by the Golden Gate Bridge and turned it on permanently in Claude.
- This caused the AI to inject Golden Gate Bridge references into any conversation randomly, illustrating feature manipulation.