
Latent Space: The AI Engineer Podcast Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)
1492 snips
Oct 16, 2025 Kyle Corbitt, co-founder and CEO of OpenPipe, discusses the shift from fine-tuning to reinforcement learning in AI. He highlights that many AI projects fail not due to capability, but reliability issues that can be resolved through continuous learning. Kyle introduces RULER, a system that simplifies reward assignment using LLMs to judge agent behaviors. He also critiques the impracticalities of GRPO and emphasizes the importance of realistic training environments for AI agents. Finally, he shares insights on the future of continual learning and serverless RL.
AI Snips
Chapters
Transcript
Episode notes
Distilling GPT-4 Workflows For Production
- GPT-4's high cost created an early market to distill expensive workflows into smaller models for production deployment.
- OpenPipe captured requests and responses to train compact task-specific models as drop-in replacements.
Market Squeeze From Cheaper Frontier Models
- Frontier model price drops repeatedly eroded the business case for distillation startups.
- Open-source models improved, reducing the gap that justified managed distillation services.
Fine-Tune Only For Latency, Cost, Or Consistency
- Only fine-tune when you need lower latency, cost, or consistency that base models can't provide.
- Avoid fine-tuning for most cases where you can use a sufficiently capable base model instead.
