Build Better AI Agents with RL & Fine-Tuning (Kyle from OpenPipe)

AI Tinkerers - "One-Shot"

00:00

Formulating Rewards: From Ground Truth to RL

Kyle explains converting synthetic QA into a scoring function for RL by using known ground-truth answers to evaluate agent rollouts.

Play episode from 18:59

chevron_right

Transcript

chevron_right

Transcript

Episode notes

What you’ll learn:

• How reinforcement learning can reduce AI agent error rates by up to 60% and drastically lower inference costs.

• The critical difference between supervised fine-tuning and RL for agentic workflows, and why RL is essential for true agent reliability.

• A practical, code-level walkthrough of building and training an email search agent that outperforms OpenAI’s GPT-3.5 on a 14-billion-parameter open-source model.

• Strategies for generating high-quality synthetic data and designing nuanced reward functions with ‘partial credit’ to effectively train your agents.

• Key use cases where RL fine-tuning delivers the most significant benefits, including real-time voice agents and high-volume applications.

Kyle Corbett is the founder of OpenPipe, a platform dedicated to helping enterprises build and deploy customized AI models using advanced fine-tuning and reinforcement learning. He’s a seasoned builder who has been working at the frontier of fine-tuning since before public APIs existed.

Key topics covered:

• The limitations of off-the-shelf LLMs for agent reliability and how RL solves them.

• The importance of latency and cost optimization in real-world AI deployments.

• Detailed explanation of the agentic workflow and tool calling in an email search bot.

• The Enron email dataset as a realistic environment for agent training.

• OpenPipe’s open-source Agent Reinforcement Trainer (ART) library for building RL agents.

• The iterative process of data generation, rubric-based scoring, and model updates.

This episode of AI Tinkerers One-Shot goes under the hood with Kyle to share practical learnings for the community.

💡 Resources:

• OpenPipe Website - https://openpipe.ai

• Kyle Corbett LinkedIn - https://www.linkedin.com/in/kcorbitt/

• AI Tinkerers - https://aitinkerers.org

• One-Shot Podcast - https://one-shot.aitinkerers.org/

Social Media: @AITinkerers @OpenPipeAI @corbtt

👍 Like this video if you found it valuable, and subscribe to AI Tinkerers One-Shot for more conversations with innovators building the future of AI!

00:00 Introduction

01:09 Welcome Kyle Corbett, Founder of OpenPipe

01:55 What OpenPipe Does

02:31 OpenPipe’s Journey and YC Experience

00:04:13 Email Search Bot Project Overview

00:05:19 Why Fine-Tuning for Email Search

00:06:22 Email Search Bot: Queries and Results

00:09:23 On-Premise Deployment and Data Sensitivity

00:10:45 Agent Trace Example and Tooling

00:13:55 Using the Enron Dataset

00:15:13 Reinforcement Learning Fundamentals

00:17:01 Synthetic Data Generation with Gemini 2.5 Pro

00:18:51 Reliable Q&A Pairs and Data Scale

00:21:59 Fine-Tuning Impact on Model Performance

00:22:25 RL Adoption in Industry and Community

00:24:37 Rollout Function and Agent Implementation

00:27:52 Rubric and Reward Calculation for RL

00:30:39 Training Loop and Model Updates

00:33:52 RL Fine-Tuning vs. OpenAI’s Fine-Tuning

00:40:38 Time Commitment for RL Projects

00:41:55 Use Cases for RL Fine-Tuning

00:45:37 OpenPipe’s Offerings: Open Source, White Glove Service

00:47:07 Kyle’s Side Tinkering and Future of AI

00:49:59 Discovering AI Tinkerers

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books