AI Breakdown

agibreakdown
undefined
Oct 16, 2025 • 8min

The Markovian Thinker

In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy. The paper proposes Markovian Thinking, a reinforcement learning paradigm that limits reasoning context to a constant-size state, enabling linear compute with constant memory rather than quadratic overhead. They implement this approach in Delethink, an environment that segments reasoning into fixed-size chunks with learned textual states to seamlessly continue reasoning after resets. Experiments show Delethink-trained models achieve longer reasoning chains more efficiently and scale better than standard methods, significantly reducing computational costs.
undefined
Oct 8, 2025 • 8min

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

In this episode, we discuss DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL by Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong. The paper introduces DeepDive, a method to improve large language models' deep search capabilities by automatically generating complex questions and applying multi-turn reinforcement learning for enhanced long-horizon reasoning. DeepDive-32B outperforms existing open-source models on browsing benchmarks like BrowseComp. The approach also enables scalable tool usage during inference, with all resources made publicly available.
undefined
Oct 3, 2025 • 7min

Towards a Physics Foundation Model

In this episode, we discuss Towards a Physics Foundation Model by Florian Wiesner, Matthias Wessling, Stephen Baek. This paper introduces the General Physics Transformer (GPhyT), a foundation model trained on diverse simulation data that can simulate multiple complex physical systems without explicit knowledge of governing equations. GPhyT outperforms specialized models by up to 29 times, generalizes zero-shot to unseen physics tasks, and maintains stable predictions over long time horizons. This work demonstrates the feasibility of a universal physics foundation model, potentially revolutionizing computational science by eliminating the need for task-specific solvers.
undefined
Sep 30, 2025 • 8min

Scalable Option Learning in High-Throughput Environments

In this episode, we discuss Scalable Option Learning in High-Throughput Environments by Mikael Henaff, Scott Fujimoto, Michael Rabbat. The paper presents Scalable Option Learning (SOL), a hierarchical reinforcement learning algorithm designed for high-throughput environments. SOL achieves a 25x increase in training speed and outperforms flat agents by training on 20 billion frames in the game NetHack. The method is also validated on MiniHack and Mujoco, demonstrating broad applicability and scalability.
undefined
Sep 24, 2025 • 8min

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

In this episode, we discuss Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning by Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin. This paper investigates Reinforcement Learning with Verifiable Rewards (RLVR) by analyzing token entropy patterns during Chain-of-Thought reasoning in Large Language Models. It finds that a small subset of high-entropy "forking" tokens critically guide reasoning pathways and that RLVR primarily adjusts these tokens to improve performance. Leveraging this insight, the authors enhance RLVR efficiency by focusing updates on these tokens, achieving better results with fewer token updates across multiple model scales.
undefined
Sep 19, 2025 • 9min

Reverse-Engineered Reasoning for Open-Ended Generation

In this episode, we discuss Reverse-Engineered Reasoning for Open-Ended Generation by Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Wei Ye, Tong Yang, Wenhao Huang, Ge Zhang, Fangzhen Lin. The paper introduces REverse-Engineered Reasoning (REER), a novel backward approach that uncovers deep reasoning steps from known good solutions instead of forward trial-and-error or imitation. Using REER, the authors create DeepWriting-20K, a large dataset of reasoning trajectories for open-ended tasks, and train DeepWriter-8B, a model that outperforms strong open-source baselines. DeepWriter-8B also matches or exceeds the performance of leading proprietary models like GPT-4o and Claude 3.5.
undefined
Sep 16, 2025 • 7min

Scaling Performance of Large Language Model Pretraining

In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther. The paper explores the challenges and strategies involved in training large language models (LLMs) at scale, focusing on distributed training and managing massive datasets across many computing nodes. It provides practical recommendations for optimizing data parallelism to fully utilize GPU resources during pretraining. The goal is to offer clearer guidance on scaling LLM training pipelines, addressing a gap in publicly available information.
undefined
Sep 15, 2025 • 9min

General Social Agents

In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper proposes using AI agents guided by social science theory and natural language instructions to predict human behavior in novel settings without ad hoc adjustments. By training these agents on human data from related "seed" games, they successfully predict outcomes across a large and diverse set of new games. Their approach outperforms traditional game-theoretic predictions and existing AI models, even exceeding predictions based on published human data in some novel scenarios.
undefined
Sep 12, 2025 • 7min

We need a new ethics for a world of AI agents

In this episode, we discuss We need a new ethics for a world of AI agents by Iason Gabriel, Geoff Keeling, Arianna Manzini & James Evans. The paper examines the shift toward autonomous AI agents capable of goal-directed actions with minimal human oversight. It highlights both the potential benefits of these agents, such as economic growth and scientific advancement, and the associated risks involving responsibility, safety, and social dynamics. The authors call for increased collaboration among various stakeholders to address challenges and ensure beneficial human-agent and agent-agent interactions.
undefined
Sep 11, 2025 • 9min

Hierarchical Reasoning Model

In this episode, we discuss Hierarchical Reasoning Model by Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori. The paper introduces the Hierarchical Reasoning Model (HRM), a recurrent architecture inspired by the brain's hierarchical processing that achieves deep, efficient reasoning in a single forward pass. HRM uses two interdependent modules for abstract planning and detailed computation, enabling it to excel on complex tasks like Sudoku and maze solving with minimal data and no pre-training. It outperforms larger models on the ARC benchmark, highlighting its promise for advancing general-purpose AI reasoning.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app