

Navigating AI Evaluation and Observability with Atin Sanyal
Jun 3, 2025
Atin Sanyal, Co-founder and CTO of Galileo, has a rich background in machine learning from companies like Uber and Apple. He dives into the intriguing challenges of AI evaluation, emphasizing the need for enhanced reliability in GenAI outputs. Atin discusses Galileo's innovative ChainPoll methodology for detecting hallucinations in language models and the importance of evolving AI quality metrics. He highlights the critical role of data quality and secure computing for enterprises, hinting at the fascinating future of AI in safe and responsible applications.
AI Snips
Chapters
Transcript
Episode notes
Trust in Generative AI Apps
- Generative AI applications require disciplined evaluation and observability to build trust.
- Despite smarter models, unpredictability increases, necessitating robust guardrails from build to production.
New Paradigm of AI Observability
- AI evaluation now merges traditional ML evaluation with observability for complex LLM-based applications.
- Measuring new error types from reasoning LLMs alongside traditional metrics is critical for AI reliability.
Custom Metrics for Complex Errors
- A Fortune 500 firm customized a metric for "overconfident hallucination" reflecting inaccurate yet confident responses.
- Galileo's platform allowed tailoring metrics to detect subtle errors unique to their multi-agent Q&A system.