Navigating AI Evaluation and Observability with Atin Sanyal

Jun 3, 2025

Atin Sanyal, Co-founder and CTO of Galileo, has a rich background in machine learning from companies like Uber and Apple. He dives into the intriguing challenges of AI evaluation, emphasizing the need for enhanced reliability in GenAI outputs. Atin discusses Galileo's innovative ChainPoll methodology for detecting hallucinations in language models and the importance of evolving AI quality metrics. He highlights the critical role of data quality and secure computing for enterprises, hinting at the fascinating future of AI in safe and responsible applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Trust in Generative AI Apps

Generative AI applications require disciplined evaluation and observability to build trust.
Despite smarter models, unpredictability increases, necessitating robust guardrails from build to production.

INSIGHT

New Paradigm of AI Observability

AI evaluation now merges traditional ML evaluation with observability for complex LLM-based applications.
Measuring new error types from reasoning LLMs alongside traditional metrics is critical for AI reliability.

ANECDOTE

Custom Metrics for Complex Errors

A Fortune 500 firm customized a metric for "overconfident hallucination" reflecting inaccurate yet confident responses.
Galileo's platform allowed tailoring metrics to detect subtle errors unique to their multi-agent Q&A system.

Get the Snipd Podcast app to discover more snips from this episode

Get the app