LLM Evaluation: Opik with Gideon Mendels

Jan 15, 2025

Gideon Mendels, co-founder and CEO of Comet, dives into the fascinating world of AI with a focus on Opik, his open-source model evaluation platform. He shares how Opik's rise exceeded expectations and emphasizes the critical role of CI/CD in AI development. Gideon also discusses the alarming decline in dedicated machine learning engineers and differentiates between genuine and 'fake' open-source solutions. The conversation wraps up with insights on the evolving AI landscape and the need for organizations to adapt to new evaluation methodologies.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Comet's Pivot to LLMs

Comet, initially focused on machine learning experiment tracking, expanded into LLM evaluation due to customer demand.
This shift was driven by the rise of LLM APIs like OpenAI and Anthropic.

INSIGHT

LLM-Specific Testing

Traditional unit tests are unsuitable for LLM apps due to semantic variations in responses.
Opik extends PyTest to enable semantic-level testing, ensuring reliable evaluation.

ANECDOTE

Early Days of Language Models

Gideon Mendels worked on language models before the current LLM boom, focusing on speech recognition and hate speech detection.
His early experience highlighted the lack of robust tools and practices in the then-nascent field of ML.

Get the Snipd Podcast app to discover more snips from this episode

Get the app