The Information Bottleneck

Ravid Shwartz-Ziv & Allen Roush
undefined
20 snips
Dec 15, 2025 • 1h 50min

EP20: Yann LeCun

Yann LeCun, a Turing Award-winning computer scientist and pioneer of deep learning, shares his bold vision for AI after leaving Meta to start Advanced Machine Intelligence. He critiques the current Silicon Valley obsession with scaling language models, arguing they won't lead to artificial general intelligence. Instead, he advocates for developing world models that simulate abstract concepts. Yann discusses learning object permanence and the challenges of game AI, while advocating for safety measures in AI design, emphasizing an architecture-focused approach.
undefined
Dec 10, 2025 • 1h 11min

EP19: AI in Finance and Symbolic AI with Atlas Wang

Atlas Wang (UT Austin faculty, XTX Research Director) joins us to explore two fascinating frontiers: the foundations of symbolic AI and the practical challenges of building AI systems for quantitative finance.On the symbolic AI side, Atlas shares his recent work proving that neural networks can learn symbolic equations through gradient descent, a surprising result given that gradient descent is continuous while symbolic structures are discrete. We talked about why neural nets learn clean, compositional mathematical structures at all, what the mathematical tools involved are, and the broader implications for understanding reasoning in AI systems.The conversation then turns to neuro-symbolic approaches in practice: agents that discover rules through continued learning, propose them symbolically, verify them against domain-specific checkers, and refine their understanding.On the finance side, Atlas pulls back the curtain on what AI research looks like at a high-frequency trading firm. The core problem sounds simple (predict future prices from past data). Still, the challenge is extreme: markets are dominated by noise, predictions hover near zero correlation, and success means eking out tiny margins across astronomical numbers of trades. He explains why synthetic data techniques that work elsewhere don't translate easily, and why XTX is building time series foundation models rather than adapting language models.We also discuss the convergence of hiring between frontier AI labs and quantitative finance, and why this is an exceptional moment for ML researchers to consider the finance industry.Links:Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning - arxiv.org/abs/2506.21797Atlas website - https://www.vita-group.space/Guest: Atlas Wang (UT Austin / XTX)Hosts: Ravid Shwartz-Ziv & Allen RoushMusic: “Kid Kodi” — Blue Dot Sessions. Source: Free Music Archive. Licensed CC BY-NC 4.0.
undefined
Dec 1, 2025 • 1h 45min

EP18: AI Robotics

In this episode, we hosted Judah Goldfeder, a PhD candidate at Columbia University and student researcher at Google, to discuss robotics, reproducibility in ML, and smart buildings.Key topics covered:Robotics challenges: We discussed why robotics remains harder than many expected, compared to LLMs. The real world is unpredictable and unforgiving, and mistakes have physical consequences. Sim-to-real transfer remains a major bottleneck because simulators are tedious to configure accurately for each robot and environment. Unlike text, robotics lacks foundation models, partly due to limited clean, annotated datasets and the difficulty of collecting diverse real-world data.Reproducibility crisis: We discussed how self-reported benchmarks can lead to p-hacking and irreproducible results. Centralized evaluation systems (such as Kaggle or ImageNet challenges), where researchers submit algorithms for testing on hidden test sets, seem to drive faster progress.Smart buildings: Judah's work at Google focuses on using ML to optimize HVAC systems, potentially reducing energy costs and carbon emissions significantly. The challenge is that every building is different. It makes the simulation configuration extremely labor-intensive. Generative AI could help by automating the process of converting floor plans or images into accurate building simulations.Links:Judah website - https://judahgoldfeder.com/Music:"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0."Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
undefined
Nov 24, 2025 • 1h 6min

EP17: RL with Will Brown

In this conversation with Will Brown, research lead at Prime Intellect specializing in reinforcement learning (RL) and multi-agent systems, they explore the foundations and practical applications of RL. Will shares insights into the challenges RL faces in LLMs, emphasizing the importance of online sampling and reward models. He discusses multi-agent dynamics, optimization techniques, and the role of game theory in AI development. The discussion also highlights the significance of intermediate results and the future directions for RL in various applications.
undefined
Nov 17, 2025 • 59min

EP16: AI News and Papers

Dive into the intriguing world of AI as the hosts dissect the quirks of conference review dynamics and the innovation of Kimi K2 thinking. Discover Google's latest TPU advancements and their competition with NVIDIA. The importance of real-world data in robotics takes center stage, while the concept of Chain of Thought Hijacking raises alarms about model vulnerabilities. Simplifying reinforcement learning with JustRL presents new possibilities, and the Cosmos project sparks curiosity around AI-driven scientific discovery.
undefined
Nov 13, 2025 • 1h 23min

EP15: The Information Bottleneck and Scaling Laws with Alex Alemi

In this discussion, Alex Alemi—a prominent AI researcher from Anthropic, formerly with Google Brain and Disney—delves into the concept of the information bottleneck. He explains how it captures the essential aspects of data while avoiding overfitting. Alemi also highlights scaling laws, revealing how smaller experiments can forecast larger behaviors in AI. He offers insights on the importance of compression in understanding models and challenges researchers to pursue ambitious questions that address broader implications for society, such as job disruption.
undefined
Nov 10, 2025 • 57min

EP14: AI News and Papers

Explore the intricate dynamics of AI in healthcare, where GPT-5's fragility raises pressing concerns about reliability and education. Discover Stanford's innovative Cartridges approach, which aims to cut down the hefty computational costs of long-context models. Delve into the transformative potential of Continuous Autoregressive Language Models and learn about practical strategies from the Smol Training Playbook. Join in the debate on benchmarks and dataset challenges in this fast-evolving field!
undefined
Nov 7, 2025 • 1h 21min

EP13: Recurrent-Depth Models and Latent Reasoning with Jonas Geiping

In this episode, we host Jonas Geiping from ELLIS Institute & Max-Planck Institute for Intelligent Systems, Tübingen AI Center, Germany. We talked about his broad research on Recurrent-Depth Models and latent reasoning in large language models (LLMs). We talked about what these models can and can't do, what are the challenges and next breakthroughs in the field, world models, and the future of developing better models. We also talked about safety and interpretability, and the role of scaling laws in AI development.Chapters00:00 Introduction and Guest Introduction01:03 Peer Review in Preprint Servers06:57 New Developments in Coding Models09:34 Open Source Models in Europe11:00 Dynamic Layers in LLMs26:05 Training Playbook Insights30:05 Recurrent Depth Models and Reasoning Tasks43:59 Exploring Recursive Reasoning Models46:46 The Role of World Models in AI48:41 Innovations in AI Training and Simulation50:39 The Promise of Recurrent Depth Models52:34 Navigating the Future of AI Algorithms54:44 The Bitter Lesson of AI Development59:11 Advising the Next Generation of Researchers01:06:42 Safety and Interpretability in AI Models01:10:46 Scaling Laws and Their Implications01:16:19 The Role of PhDs in AI ResearchLinks and paper:Jonas' website - https://jonasgeiping.github.io/Scaling up test-time compute with latent reasoning: A recurrent depth approach - https://arxiv.org/abs/2502.05171The Smol Training Playbook: The Secrets to Building World-Class LLMs - https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbookVaultGemma: A Differentially Private Gemma Model - https://arxiv.org/abs/2510.15001Music:“Kid Kodi” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.“Palms Down” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
undefined
Nov 3, 2025 • 58min

EP12: Adversarial attacks and compression with Jack Morris

In this episode of the Information Bottleneck Podcast, we host Jack Morris, a PhD student at Cornell, to discuss adversarial examples (Jack created TextAttack, the first software package for LLM jailbreaking), the Platonic representation hypothesis, the implications of inversion techniques, and the role of compression in language models.Links:Jack's Website - https://jxmo.io/TextAttack - https://arxiv.org/abs/2005.05909How much do language models memorize? https://arxiv.org/abs/2505.24832DeepSeek OCR - https://www.arxiv.org/abs/2510.18234Chapters:00:00 Introduction and AI News Highlights04:53 The Importance of Fine-Tuning Models10:01 Challenges in Open Source AI Models14:34 The Future of Model Scaling and Sparsity19:39 Exploring Model Routing and User Experience24:34 Jack's Research: Text Attack and Adversarial Examples29:33 The Platonic Representation Hypothesis34:23 Implications of Inversion and Security in AI39:20 The Role of Compression in Language Models44:10 Future Directions in AI Research and Personalization
undefined
Oct 28, 2025 • 1h 18min

EP11: JEPA with Randall Balestriero

Randall Balestriero, an assistant professor at Brown University specializing in representation learning, dives deep into Joint Embedding Predictive Architectures (JEPA). He explains how JEPA learns data representations without reconstruction, focusing on meaningful features while compressing irrelevant details. The discussion covers the challenges of model collapse, prediction tasks shaping feature learning, and the implications for AGI benchmarks. Balestriero also shares insights on evaluating JEPA models, the role of latent variables, and the growing opportunity in JEPA research.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app