MLOps.community

Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312

71 snips
May 6, 2025
Alon Bochman, CEO of RagMetrics and AI veteran, dives into the complexities of making AI reliable. He emphasizes empirical evaluation over influencer advice, advocating for collaboration between technical and domain experts. Alon discusses the importance of tailoring AI solutions and involving subject matter experts in development. The conversation also covers fine-tuning language models through expert feedback and the challenges of AI in finance, highlighting the need for effective knowledge-sharing to enhance accuracy in decision-making.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Empirical Testing Beats Advice

  • No one knows what AI setup works best for your specific task, not even the leading AI labs.
  • Empirically test different models and configurations using your own evaluation benchmarks to find what really works.
ADVICE

Start With Clear Eval Benchmarks

  • Define clear, specific evaluation examples from user perspective early on.
  • Use these examples as benchmarks to compare different models or system components.
ADVICE

Rerun All Evals After Changes

  • Always rerun your full evaluation set after changes to avoid breaking previous cases.
  • This continuous evaluation catches unexpected regressions and maintains system quality.
Get the Snipd Podcast app to discover more snips from this episode
Get the app