Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

1749 snips

Oct 16, 2025

Kyle Corbitt, co-founder and CEO of OpenPipe, discusses the shift from fine-tuning to reinforcement learning in AI. He highlights that many AI projects fail not due to capability, but reliability issues that can be resolved through continuous learning. Kyle introduces RULER, a system that simplifies reward assignment using LLMs to judge agent behaviors. He also critiques the impracticalities of GRPO and emphasizes the importance of realistic training environments for AI agents. Finally, he shares insights on the future of continual learning and serverless RL.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

00:00 / 00:00

Distilling GPT-4 Workflows For Production

GPT-4's high cost created an early market to distill expensive workflows into smaller models for production deployment.
OpenPipe captured requests and responses to train compact task-specific models as drop-in replacements.

00:00 / 00:00

Market Squeeze From Cheaper Frontier Models

Frontier model price drops repeatedly eroded the business case for distillation startups.
Open-source models improved, reducing the gap that justified managed distillation services.

00:00 / 00:00

Fine-Tune Only For Latency, Cost, Or Consistency

Only fine-tune when you need lower latency, cost, or consistency that base models can't provide.
Avoid fine-tuning for most cases where you can use a sufficiently capable base model instead.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC's Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven't lived up to the hype in his testing, and why the hardest part of deploying agents isn't the AI - it's sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe's acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle's vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance.

Key Topics:

The rise and fall of fine-tuning as a business model
Why 90% of AI projects never reach production
RULER: Making RL accessible through relative ranking
The environment problem: Why sandboxing is harder than training
GRPO vs PPO and the future of RL algorithms
LoRAs: The underrated deployment optimization
Why GEPA and prompt optimization disappointed in practice
Building world models as synthetic training environments
The $500B Stargate bet and OpenAI's potential crypto play
Continuous learning as the path to reliable agents

References

https://www.linkedin.com/in/kcorbitt/

- Aug 2023 https://openpipe.ai/blog/from-prompts-to-models
- DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized
- JAN 2024 https://openpipe.ai/blog/s-lora
- MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod
https://www.youtube.com/watch?v=-hYqt8M9u_M
Oct 2024 https://openpipe.ai/blog/announcing-dpo-support
AIE NYC 2025 Finetuning 500m agents https://www.youtube.com/watch?v=zM9RYqCcioM&t=919s
AIEWF 2025 How to train your agent (ART-E) https://www.youtube.com/watch?v=gEDl9C8s_-4&t=216s
SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave
W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153