Training Data cover image

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Training Data

00:00

Refining Language Models for Reliable Chatbots

The chapter explores refining pre-trained language models to create reliable chatbots through Reinforced Learning from Human Feedback, aligning with user preferences to enhance performance. It discusses challenges extending this approach to broader agency creation and the importance of ground truth rewards in reinforcement learning. The conversation delves into training reward models to tackle the absence of ground truth rewards for human preferences and complex tasks, highlighting potential challenges in AI behavior.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app