AI Tinkerers - "One-Shot" cover image

Build Better AI Agents with RL & Fine-Tuning (Kyle from OpenPipe)

AI Tinkerers - "One-Shot"

00:00

Formulating Rewards: From Ground Truth to RL

Kyle explains converting synthetic QA into a scoring function for RL by using known ground-truth answers to evaluate agent rollouts.

Play episode from 18:59
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app