DataTalks.Club

How to Build and Evaluate AI systems in the Age of LLMs - Hugo Bowne-Anderson

Oct 24, 2025
Hugo Bowne-Anderson, an independent AI consultant and educator, shares insights from his journey from academia to advising major companies like Netflix and Meta. He discusses how to build reliable AI systems, focusing on practical tips for prompt evaluation and dataset design. Hugo emphasizes the importance of structuring teams for successful AI adoption and offers strategies to avoid common pitfalls like prompt overfitting. Listeners will learn about debugging tools and the evolution of proactive AI agents that enhance productivity in everyday workflows.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Design Prompts And Add An Evaluator Loop

  • Give prompts a clear role, objective, few-shot examples, and heuristics to improve outputs.
  • Build an evaluator-optimizer loop so one model scores outputs and another revises until they pass.
ADVICE

Save Prompts And Automate The Pipeline

  • Save and reuse prompts that perform well across representative examples instead of redoing prompts for each transcript.
  • Automate the pipeline (GitHub Actions, etc.) to process transcripts and keep consistent quality at scale.
INSIGHT

Make Eval Sets Representative And Practical

  • Use a representative, not huge, evaluation set and cheaper automated checks where possible to control costs.
  • Inspect data in spreadsheets to uncover failure modes and guide how large your test set must be.
Get the Snipd Podcast app to discover more snips from this episode
Get the app