

AI Breakdown
agibreakdown
The podcast where we use AI to breakdown the recent AI papers and provide simplified explanations of intricate AI topics for educational purposes.
The content presented here is generated automatically by utilizing LLM and text to speech technologies. While every effort is made to ensure accuracy, any potential misrepresentations or inaccuracies are unintentional due to evolving technology. We value your feedback to enhance our podcast and provide you with the best possible learning experience.
The content presented here is generated automatically by utilizing LLM and text to speech technologies. While every effort is made to ensure accuracy, any potential misrepresentations or inaccuracies are unintentional due to evolving technology. We value your feedback to enhance our podcast and provide you with the best possible learning experience.
Episodes
Mentioned books

Sep 10, 2025 • 8min
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
In this episode, we discuss ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts by Yuying Ge, Yixiao Ge, Chen Li, Teng Wang, Junfu Pu, Yizhuo Li, Lu Qiu, Jin Ma, Lisheng Duan, Xinyu Zuo, Jinwen Luo, Weibo Gu, Zexuan Li, Xiaojing Zhang, Yangyu Tao, Han Hu, Di Wang, Ying Shan. The paper presents ARC-Hunyuan-Video, a 7B-parameter multimodal model designed for detailed, temporally-structured understanding of short user-generated videos using visual, audio, and text inputs. It supports tasks like timestamped captioning, summarization, question answering, and video reasoning, trained through a multi-stage process including reinforcement learning. Evaluations show strong real-world performance, efficiency, and positive impact on user engagement in production deployment.

Sep 9, 2025 • 8min
Small Language Models are the Future of Agentic AI
In this episode, we discuss Small Language Models are the Future of Agentic AI by Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, Pavlo Molchanov. The paper argues that small language models (SLMs) are more suitable, powerful enough, and cost-effective for many specialized agentic AI tasks compared to large language models (LLMs). It proposes that heterogeneous agentic systems using multiple models are ideal when general conversational abilities are needed and presents an algorithm for converting LLM-based agents to SLM-based ones. The authors emphasize the economic and operational benefits of shifting towards SLMs and invite further discussion to advance affordable AI deployment.

Sep 8, 2025 • 7min
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel. The paper introduces a framework enabling large language model agents to dynamically decide when to plan during task execution, improving efficiency and performance. They propose a two-stage training process combining supervised fine-tuning and reinforcement learning to develop this capability. Experiments show these dynamically planning agents are more sample-efficient, achieve complex goals better, and can be guided by human plans.

Sep 7, 2025 • 8min
Why Language Models Hallucinate
In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are:
- Adam Tauman Kalai
- Ofir Nachum
- Santosh S. Vempala
- Edwin Zhang. The paper explains that hallucinations in large language models arise because training and evaluation reward guessing over admitting uncertainty, framing the issue as errors in binary classification. It shows that models become incentivized to produce plausible but incorrect answers to perform well on benchmarks. The authors propose that addressing hallucinations requires changing how benchmarks are scored, promoting more trustworthy AI by discouraging penalization of uncertain responses.

Aug 19, 2025 • 7min
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
In this episode, we discuss Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens by Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu. The paper investigates Chain-of-Thought (CoT) reasoning in large language models, revealing it may not reflect true inferential processes but rather learned patterns tied to training data distributions. Using a controlled environment called DataAlchemy, the authors show CoT reasoning breaks down when models face out-of-distribution tasks, lengths, or formats. This highlights the limitations of CoT prompting and the challenge of achieving authentic, generalizable reasoning in LLMs.

Aug 15, 2025 • 8min
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
In this episode, we discuss Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models by Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim G. J. Rudner, Yann LeCun. The paper compares model-free reinforcement learning and model-based control methods for solving navigation tasks using offline, reward-free data. It finds that reinforcement learning performs best with large, high-quality datasets, while model-based planning with latent dynamics models generalizes better to new environments and handles suboptimal data more efficiently. Overall, latent model-based planning is highlighted as a robust approach for offline learning and adapting to diverse tasks.

Aug 13, 2025 • 9min
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
In this episode, we discuss Persona Vectors: Monitoring and Controlling Character Traits in Language Models by Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, Jack Lindsey. The paper introduces persona vectors in large language models’ activation space that correspond to traits like evil or sycophancy and can track personality changes. These vectors help predict, control, and mitigate unintended personality shifts during training and deployment. Additionally, the method automates persona vector extraction from natural language descriptions and aids in identifying problematic training data.

Aug 1, 2025 • 9min
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
In this episode, we discuss Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning by Jaedong Hwang, Kumar Tanmay, Seok-Jin Lee, Ayush Agrawal, Hamid Palangi, Kumar Ayush, Ila Fiete, Paul Pu Liang. The paper introduces GEOFACT-X, a multilingual factual reasoning benchmark with annotated reasoning traces in five languages to better evaluate language consistency in LLM reasoning. It proposes BRIDGE, a training method using supervised fine-tuning and reinforcement learning with a language-consistency reward to align model reasoning with the input language. Experiments show that BRIDGE significantly improves multilingual reasoning fidelity, highlighting the importance of reasoning-aware multilingual reinforcement learning for cross-lingual generalization.

Jul 31, 2025 • 9min
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
In this episode, we discuss Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards by Jaeho Kim, Yunseok Lee, Seulki Lee. The paper addresses challenges in AI conference peer review caused by massive submission volumes and declining review quality. It proposes a bi-directional review system where authors evaluate reviewers, and reviewers receive formal accreditation to improve accountability. The paper focuses on reforming reviewer responsibility through a two-stage feedback loop and incentive mechanisms to promote sustainable, high-quality reviews.

Jul 31, 2025 • 8min
Working with AI: Measuring the Occupational Implications of Generative AI
In this episode, we discuss Working with AI: Measuring the Occupational Implications of Generative AI by Kiran Tomlinson, Sonia Jaffe, Will Wang, Scott Counts, Siddharth Suri. The paper analyzes 200,000 anonymized interactions between users and Microsoft Bing Copilot to understand how AI assists with various work activities. It identifies information gathering, writing, teaching, and advising as key activities supported by AI and calculates an AI applicability score across occupations. The study finds the highest AI impact on knowledge work and communication-related jobs, highlighting correlations with wage, education, and real-world AI usage patterns.