Vanishing Gradients cover image

Vanishing Gradients

Latest episodes

undefined
20 snips
Jul 18, 2025 • 41min

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Zach Mueller, who leads Accelerate at Hugging Face, shares his expertise on scaling AI from cozy Colab environments to powerful clusters. He explains how to get started with just a couple of GPUs, debunks myths about performance bottlenecks, and discusses practical strategies for training on a budget. Zach emphasizes the importance of understanding distributed systems for any ML engineer and underscores how these skills can make a significant impact on their career. Tune in for actionable insights and demystifying tips!
undefined
26 snips
Jul 8, 2025 • 45min

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

Samuel Colvin, the mastermind behind Pydantic and founder of Logfire, discusses the often-overlooked challenges in AI reliability. He emphasizes how durability is key, not just flashy demos, and reveals that tiny feedback loops can significantly enhance performance insights. Colvin introduces innovative concepts like prompt self-repair systems and drift alarms, which can catch shifts before they become problems. He advocates for business-driven metrics that ensure features align with real goals, making AI not just functional but dependable in real-world applications.
undefined
7 snips
Jul 2, 2025 • 29min

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

Eric Ma, who leads data science research at Moderna, dives into the challenges of aligning retrieval with user intent in LLM-powered systems. He argues that most features fail not at the model level but with context. Eric reveals how a simple YAML-based approach can outperform complex pipelines and discusses the pitfalls of vague user queries. He also emphasizes the importance of evolving retrieval workflows to meet user needs and when it's sufficient to rely on intuition versus formal evaluation in refining these systems.
undefined
13 snips
Jun 26, 2025 • 48min

Episode 51: Why We Built an MCP Server and What Broke First

In this discussion, Philip Carter, Product Management Director at Salesforce and former Principal PM at Honeycomb, shares insights on creating LLM-powered features. He explains the nuances of integrating real production data with these systems. Carter dives into the challenges of tool use, prompt templates, and flaky model behavior. He also discusses the development of the innovative MCP server that enhances observability in AI systems, emphasizing its role in improving user experience and navigating the pitfalls of SaaS product development.
undefined
20 snips
Jun 17, 2025 • 28min

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

Hamel Husain, an AI specialist with experience at Airbnb, GitHub, and DataRobot, discusses improving AI products through effective evaluation. He highlights the importance of error analysis and systematic iteration in development. The conversation reveals common pitfalls in debugging AI systems, stressing the collaboration between engineers and domain experts to drive progress. Hamel also emphasizes that evaluation should be a comprehensive process, balancing immediate fixes with strategic assessment. This dialogue is a must-hear for anyone grappling with AI system enhancements.
undefined
Jun 5, 2025 • 1h 22min

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

Akshay Agrawal, founder of Marimo and former Google Brain researcher, discusses the critical challenges faced in AI at scale. He emphasizes the need for robust infrastructure over just improved models. The conversation covers the importance of reproducibility and the shortcomings of traditional tools. Akshay introduces Marimo's innovative design that addresses modular AI applications and the difficulties in debugging large language models. Live demos illustrate Marimo's capabilities in data extraction and agentic workflows, merging technical insights with cultural reflections in data science.
undefined
May 23, 2025 • 1h 4min

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it. In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short. They discuss: 🧠 Why we still lack a shared definition of intelligence 🧪 How ARC tasks force models to learn novel skills at test time 📉 Why GPT-4-class models still underperform on ARC 🔎 The limits of traditional benchmarks like MMLU and Big-Bench ⚙️ What the OpenAI O₃ results reveal—and what they don’t 💡 Why generalization and efficiency, not raw capability, are key to AGI Greg also shares what he’s seeing in the wild: how startups and independent researchers are using ARC as a North Star, how benchmarks shape the frontier, and why the ARC team believes we’ll know we’ve reached AGI when humans can no longer write tasks that models can’t solve. This conversation is about evaluation—not hype. If you care about where AI is really headed, this one’s worth your time. LINKS ARC Prize -- What is ARC-AGI? On the Measure of Intelligence by François Chollet Greg Kamradt on Twitter Hugo's High Signal Podcast with Fei-Fei Li Vanishing Gradients YouTube Channel Upcoming Events on Luma Hugo's recent newsletter about upcoming events and more! Watch the podcast here on YouTube! 🎓 Want to go deeper? Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers. Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in. This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful. Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more. Cohort starts July 8 — Use this link for a 10% discount
undefined
6 snips
Apr 7, 2025 • 1h 19min

Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

Joe Reis, co-author of Fundamentals of Data Engineering and critic of 'vibe coding,' engages in a thought-provoking discussion about the impact of AI on software development. He highlights the dangers of coding by intuition without structure, exploring the balance between innovation and traditional practices. The conversation examines the implications of AI tools on technical debt, security risks, and the evolution of production standards. Moreover, Reis reflects on the importance of craftsmanship and the learning curve in an age of disposable code.
undefined
11 snips
Apr 3, 2025 • 1h 9min

Episode 46: Software Composition Is the New Vibe Coding

Greg Ceccarelli, co-founder of SpecStory and ex-CPO at Pluralsight, dives into the groundbreaking concept of software composition, likening it to musical composition. He discusses how AI and LLMs facilitate vibe coding, making programming more intuitive and accessible. The conversation reveals the democratizing power of these tools, emphasizing intent over traditional coding and the collaborative potential they unleash. Greg also addresses the challenges of evolving technologies in data science and the importance of balancing creativity with robust practices in software development.
undefined
Feb 20, 2025 • 1h 18min

Episode 45: Your AI application is broken. Here’s what to do about it.

Joining the discussion is Hamel Husain, a seasoned ML engineer and open-source contributor, who shares invaluable insights on debugging generative AI systems. He emphasizes that understanding data is key to fixing broken AI applications. Hamel advocates for spreadsheet error analysis over complex dashboards. He also highlights the pitfalls of trusting LLM judges blindly and critiques existing AI dashboard metrics. His practical methods will transform how developers approach model performance and iteration in AI.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app