AI Confidential

Navigating AI Evaluation and Observability with Atin Sanyal

Jun 3, 2025
Atin Sanyal, Co-founder and CTO of Galileo, has a rich background in machine learning from companies like Uber and Apple. He dives into the intriguing challenges of AI evaluation, emphasizing the need for enhanced reliability in GenAI outputs. Atin discusses Galileo's innovative ChainPoll methodology for detecting hallucinations in language models and the importance of evolving AI quality metrics. He highlights the critical role of data quality and secure computing for enterprises, hinting at the fascinating future of AI in safe and responsible applications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Trust in Generative AI Apps

  • Generative AI applications require disciplined evaluation and observability to build trust.
  • Despite smarter models, unpredictability increases, necessitating robust guardrails from build to production.
INSIGHT

New Paradigm of AI Observability

  • AI evaluation now merges traditional ML evaluation with observability for complex LLM-based applications.
  • Measuring new error types from reasoning LLMs alongside traditional metrics is critical for AI reliability.
ANECDOTE

Custom Metrics for Complex Errors

  • A Fortune 500 firm customized a metric for "overconfident hallucination" reflecting inaccurate yet confident responses.
  • Galileo's platform allowed tailoring metrics to detect subtle errors unique to their multi-agent Q&A system.
Get the Snipd Podcast app to discover more snips from this episode
Get the app