MLOps.community

GPU Considerations, Labeling Privacy, Rapid Fine Tuning, and the Role of Private Eval Pipelines to Benchmark New Models

27 snips
Aug 9, 2025
Zulkuf Genc, Director of AI at Prosus Group, and Paul van der Boor, VP of AI at Prosus Group, dive deep into the world of AI agents in production. They discuss the intricacies of GPU management and the importance of tailored evaluation criteria. Privacy concerns are tackled with innovative tagging methods for user data, while the rapid development of specialized models is highlighted as essential amidst the fast-paced evolution of AI technology. Their insights offer invaluable lessons for anyone looking to navigate the challenges of large-scale AI implementations.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Benchmarks Fail For Domain Tasks

  • Benchmarks shown on public leaderboards rarely predict real-world performance on niche tasks.
  • Evaluate models on your own domain data and languages before trusting vendor claims.
ADVICE

Private Eval Leaderboards For Fast Signals

  • Build an internal leaderboard to benchmark many models quickly on the tasks you care about.
  • Use those quick signals to decide which models to investigate and move toward production.
ANECDOTE

Pizza-Powered Labeling Sessions

  • We run internal labeling parties with pizza and snacks to build high-quality eval sets quickly.
  • Those sessions also expose non-AI staff to the work and produce usable ground truth for benchmarks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app