The Importance of Custom Evaluations and Synthetic Data in AI Development

Exploring the significance of custom evaluations in AI development over benchmarks, emphasizing the utility of synthetic data for tailoring datasets to suit specific application needs.

Play episode from 31:23

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Join us at our first in-person conference on June 25, all about AI Quality: https://www.aiqualityconference.com/

Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // MLOps podcast #241 with Boris Selitser, Co-Founder and CTO/CPO of Okareo.

A big thank you to LatticeFlow for sponsoring this episode! LatticeFlow - https://latticeflow.ai/

// Abstract

Explore the evolving landscape of building LLM applications, focusing on the critical roles of synthetic data and agent evaluations. Discover how synthetic data enhances model behavior description, prototyping, testing, and fine-tuning, driving robustness in LLM applications. Learn about the latest methods for evaluating complex agent-based systems, including RAG-based evaluations, dialog-level assessments, simulated user interactions, and adversarial models. This talk delves into the specific challenges developers face and the tradeoffs involved in each evaluation approach, providing practical insights for effective AI development.

// Bio

Boris is the Co-Founder and CTO/CPO at Okareo. Okareo is a full-cycle platform for developers to evaluate and customize AI/LLM applications. Before Okareo, Boris was Director of Product at Meta/Facebook, leading teams building internal platforms and ML products. Examples include a copyright classification system across the Facebook apps and an engagement platform for over 200K developers, 500K+ creators, and 12M+ Oculus users. Boris has a bachelor’s in Computer Science from UC Berkeley.

// MLOps Jobs board

jobs.mlops.community

// MLOps Swag/Merch

https://mlops-community.myshopify.com/

// Related Links

https://docs.okareo.com/blog/data_loop https://docs.okareo.com/blog/agent_eval

The Real E2E RAG Stack // Sam Bean // MLOps Podcast #217 - https://youtu.be/8uZst7pgOw0

RecSys at Spotify // Sanket Gupta // MLOps Podcast #232 - https://youtu.be/byH-ARJA4gk

--------------- ✌️Connect With Us ✌️ -------------

Join our Slack community: https://go.mlops.community/slack

Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/