GPU Considerations, Labeling Privacy, Rapid Fine Tuning, and the Role of Private Eval Pipelines to Benchmark New Models

37 snips

Aug 9, 2025

Guest

Paul van der Boor

Guest

Zulkuf Genc

Zulkuf Genc, Director of AI at Prosus Group, and Paul van der Boor, VP of AI at Prosus Group, dive deep into the world of AI agents in production. They discuss the intricacies of GPU management and the importance of tailored evaluation criteria. Privacy concerns are tackled with innovative tagging methods for user data, while the rapid development of specialized models is highlighted as essential amidst the fast-paced evolution of AI technology. Their insights offer invaluable lessons for anyone looking to navigate the challenges of large-scale AI implementations.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Benchmarks Fail For Domain Tasks

Benchmarks shown on public leaderboards rarely predict real-world performance on niche tasks.
Evaluate models on your own domain data and languages before trusting vendor claims.

ADVICE

Private Eval Leaderboards For Fast Signals

Build an internal leaderboard to benchmark many models quickly on the tasks you care about.
Use those quick signals to decide which models to investigate and move toward production.

ANECDOTE

Pizza-Powered Labeling Sessions

We run internal labeling parties with pizza and snacks to build high-quality eval sets quickly.
Those sessions also expose non-AI staff to the work and produce usable ground truth for benchmarks.

Get the Snipd Podcast app to discover more snips from this episode

Get the app