The Gradient: Perspectives on AI cover image

Riley Goodside: The Art and Craft of Prompt Engineering

The Gradient: Perspectives on AI

00:00

The Evolution of Instruction Tuned Models

Our LHF stands for reinforcement learning on human feedback. The model generates you start with an instruction to model the model generates many completions to a given prompt. And then these prompts are put into order of best to worse by humans. These rankings are used to train a preference model that can imitate the preferences that they provide and provide, and be able to do that ranking automatically on further generations. So it's sort of, it automates this process of the human providing feedback and allows it to fine tune on a much greater scale of data.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app