AXRP - the AI X-risk Research Podcast

29 - Science of Deep Learning with Vikrant Varma

Apr 25, 2024
Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Challenges of CCS Revealing True Beliefs

  • Contrast Consistent Search (CCS) aims to extract a model's latent knowledge by probing consistency in model beliefs.
  • However, probes may pick up arbitrary or superficial signals unrelated to true semantic content, complicating interpretation.
INSIGHT

Entanglement Challenges in Probing Models

  • Models can entangle multiple agent beliefs and superficial text features, making latent knowledge extraction challenging.
  • Salient directions in activations might reflect entity beliefs or distractors rather than objective truth.
INSIGHT

Complex Activation Patterns in Models

  • Model activations encode complex interactions, including XOR-type entanglements, that linear probes cannot fully separate.
  • These interactions make it difficult to isolate pure truth signals from model activations.
Get the Snipd Podcast app to discover more snips from this episode
Get the app