

LLM Evaluation: Opik with Gideon Mendels
Jan 15, 2025
Gideon Mendels, co-founder and CEO of Comet, dives into the fascinating world of AI with a focus on Opik, his open-source model evaluation platform. He shares how Opik's rise exceeded expectations and emphasizes the critical role of CI/CD in AI development. Gideon also discusses the alarming decline in dedicated machine learning engineers and differentiates between genuine and 'fake' open-source solutions. The conversation wraps up with insights on the evolving AI landscape and the need for organizations to adapt to new evaluation methodologies.
AI Snips
Chapters
Transcript
Episode notes
Comet's Pivot to LLMs
- Comet, initially focused on machine learning experiment tracking, expanded into LLM evaluation due to customer demand.
- This shift was driven by the rise of LLM APIs like OpenAI and Anthropic.
LLM-Specific Testing
- Traditional unit tests are unsuitable for LLM apps due to semantic variations in responses.
- Opik extends PyTest to enable semantic-level testing, ensuring reliable evaluation.
Early Days of Language Models
- Gideon Mendels worked on language models before the current LLM boom, focusing on speech recognition and hate speech detection.
- His early experience highlighted the lack of robust tools and practices in the then-nascent field of ML.