The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

299 snips

May 13, 2025

Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, dives into the innovative world of reinforcement learning (RL) and its impact on AI agents. He highlights the importance of data curation and evaluation, asserting that RL outperforms traditional prompting methods. The conversation touches on limitations of supervised fine-tuning, reward-shaping strategies, and specialized models like MiniCheck for hallucination detection. Mahesh also discusses tools like Curator and the exciting future of automated AI engineering, promising to make powerful solutions accessible to all.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

RL Outperforms Prompting Alone

Reinforcement learning (RL) offers a more robust way to train models beyond just prompting them.
RL teaches the model to reason through problems and understand what is good or bad behavior.

INSIGHT

Data Curation Drives Performance

The role of curated data is critical in improving model performance rather than just model changes.
Data curation gives significant alpha and should be the focus instead of solely tuning model parameters.

ADVICE

Evaluate and Analyze Errors Thoroughly

Perform both evaluation and error analysis to identify model or agent mistakes.
Use data visualization and annotation tools to understand issues like hallucinations and retrieval errors.

Get the Snipd Podcast app to discover more snips from this episode

Get the app