The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CHAPTER

Evaluating Language Models: Biases and Learning Strategies

This chapter examines the intricacies of evaluating responses generated by large language models, focusing on biases and the challenges of extracting accurate answers. It contrasts supervised fine-tuning with reinforcement learning in model training, emphasizing the potential for RL to foster adaptability. The discussion also introduces advanced algorithms like Group Reference Policy Optimization, while highlighting the importance of scaling performance and replication in research.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner