

Embryology of AI: How Training Data Shapes AI Development w/ Timaeus' Jesse Hoogland & Daniel Murfet
19 snips Jun 19, 2025
Jesse Hoogland and Daniel Murfet, co-founders of Timaeus, pioneer AI safety with their focus on developmental interpretability based on Singular Learning Theory. They discuss the complex, jagged landscapes of neural networks and how their Local Learning Coefficient can identify critical training phase changes. This innovative approach aims to catch safety issues early, providing a more structured methodology for AI development. Their insights reveal the intricate relationships between training data, model behavior, and alignment, pushing for a principled engineering discipline in AI.
AI Snips
Chapters
Books
Transcript
Episode notes
Singularities Shape Loss Landscapes
- Loss landscapes of neural networks are complex, jagged surfaces full of singularities that can change the model internally without affecting external behavior.
- This internal change can mask dangerous misalignment, making it hard to distinguish between fundamentally aligned models and deceptive ones.
Phase Transitions Simplify Interpretability
- Developmental interpretability uses singular learning theory to find phase transitions during neural network training.
- These phase transitions act as meaningful units of change, simplifying interpretability by marking key developmental stages in training.
Unpacking Generalization in AI
- Generalization typically means predicting well on new samples from the same distribution, while out-of-distribution generalization is much harder.
- Interpretability aims to explain the underlying algorithm behind good generalization, not just a numerical measure.