Latent Space: The AI Engineer Podcast cover image

RLHF 201 - with Nathan Lambert of AI2 and Interconnects

Latent Space: The AI Engineer Podcast

00:00

How to Use RLE Jeff to Make the Bottom Better Without Using PPO

Using rejection sampling and best event sampling can improve the quality of generated answers by spending more inference compute based on a preference data set. This approach, discussed in a step-by-step paper from OpenAI, involves generating multiple responses to a prompt, passing them through a reward model, and selecting the one with the highest scalar value. It's a logical and effective technique used by many researchers and can enhance the outputs without relying on PPO.

Play episode from 54:05
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app