The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Latest episodes

undefined
102 snips
Jun 5, 2025 • 1h 25min

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

In this insightful conversation, Charles Martin, the founder of Calculation Consulting and an AI researcher merging physics with machine learning, introduces WeightWatcher, a groundbreaking tool for enhancing Deep Neural Networks. He explores the revolutionary Heavy-Tailed Self-Regularization theory and how it exposes phases like grokking and generalization collapse. The discussion delves into fine-tuning models, the perplexing relationship between model quality and hallucinations, and the challenges of generative AI, providing valuable lessons for real-world applications.
undefined
284 snips
May 28, 2025 • 26min

Google I/O 2025 Special Edition - #733

Logan Kilpatrick and Shrestha Basu Mallick from Google DeepMind dive into groundbreaking advancements from Google I/O 2025. They discuss the Gemini API's impressive features like thinking budgets and thought summaries, enhancing voice AI’s expressiveness with native audio output. The duo shares insights on the challenges of building real-time voice applications, including latency and voice detection. They also send a playful wish list for next year's event, dreamily aiming for enhanced language capabilities to foster global inclusivity.
undefined
98 snips
May 21, 2025 • 57min

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

Sebastian Gehrmann, head of Responsible AI at Bloomberg, dives into the complexities of AI safety, particularly in retrieval-augmented generation (RAG) systems. He reveals how RAG can unintentionally compromise safety, even leading to unsafe outputs. The conversation highlights unique risks in financial services, emphasizing the need for specific governance frameworks and tailored evaluation methods. Gehrmann also addresses prompt engineering as a strategy for enhancing safety, underscoring the necessity for ongoing collaboration in the AI field to tackle emerging vulnerabilities.
undefined
244 snips
May 13, 2025 • 1h 1min

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, dives into the innovative world of reinforcement learning (RL) and its impact on AI agents. He highlights the importance of data curation and evaluation, asserting that RL outperforms traditional prompting methods. The conversation touches on limitations of supervised fine-tuning, reward-shaping strategies, and specialized models like MiniCheck for hallucination detection. Mahesh also discusses tools like Curator and the exciting future of automated AI engineering, promising to make powerful solutions accessible to all.
undefined
375 snips
May 6, 2025 • 1h 7min

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

Josh Tobin, a member of the technical staff at OpenAI and co-founder of Gantry, dives into the fascinating world of AI agents. He discusses OpenAI's innovative offerings like Deep Research and Operator, highlighting their ability to manage complex tasks through advanced reasoning. The conversation also explores unexpected use cases for these agents and the future of human-AI collaboration in software development. Additionally, Josh emphasizes the challenges of ensuring trust and safety as AI systems evolve, making for an insightful and thought-provoking discussion.
undefined
86 snips
Apr 30, 2025 • 56min

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

In this engaging discussion, Nidhi Rastogi, an assistant professor at the Rochester Institute of Technology specializing in Cyber Threat Intelligence, dives into her project CTIBench. She explores the evolution of AI in cybersecurity, emphasizing how large language models (LLMs) enhance threat detection and defense. Nidhi discusses the challenges of outdated information and the advantages of Retrieval-Augmented Generation for real-time responses. She also highlights how benchmarks can expose model limitations and the vital role of understanding emerging threats in cybersecurity.
undefined
146 snips
Apr 23, 2025 • 54min

Generative Benchmarking with Kelly Hong - #728

Kelly Hong, a researcher at Chroma, delves into generative benchmarking, a vital approach for evaluating retrieval systems with synthetic data. She critiques traditional benchmarks for failing to mimic real-world queries, stressing the importance of aligning LLM judges with human preferences. Kelly explains a two-step process: filtering relevant documents and generating user-like queries to enhance AI performance. The discussion also covers the nuances of chunking strategies and the differences between benchmark and real-world queries, advocating for a more systematic AI evaluation.
undefined
137 snips
Apr 14, 2025 • 1h 34min

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

Emmanuel Ameisen, a research engineer at Anthropic specializing in interpretability research, shares insights from his recent studies on large language models. He discusses how mechanistic interpretability methods shed light on internal processes, showing how models plan creative tasks like poetry and calculate math using unique algorithms. The conversation dives into neural pathways, revealing how hallucinations stem from separate recognition circuits. Emmanuel highlights the challenges of accurately interpreting AI behavior and the importance of understanding these systems for safety and reliability.
undefined
146 snips
Apr 8, 2025 • 52min

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

Maohao Shen, a PhD student at MIT specializing in AI reliability, discusses his groundbreaking work on 'Satori.' He reveals how it enhances language model reasoning through reinforcement learning, enabling self-reflection and exploration. The podcast dives into the innovative Chain-of-Action-Thought approach, which guides models in complex reasoning tasks. Maohao also explains the two-stage training process, including format tuning and self-corrective techniques. The conversation highlights Satori’s impressive performance and its potential to redefine AI reasoning capabilities.
undefined
71 snips
Mar 31, 2025 • 1h 9min

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725

In this engaging discussion, Drago Anguelov, VP of AI foundations at Waymo, sheds light on the groundbreaking integration of foundation models in autonomous driving. He explains how Waymo harnesses large-scale machine learning and multimodal sensor data to enhance perception and planning. Drago also addresses safety measures, including rigorous validation frameworks and predictive models. The conversation dives into the challenges of scaling these models across diverse driving environments and the future of AV testing through sophisticated simulations.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app