5min chapter

The Gradient: Perspectives on AI cover image

Riley Goodside: The Art and Craft of Prompt Engineering

The Gradient: Perspectives on AI

CHAPTER

The Evolution of Instruction Tuned Models

Our LHF stands for reinforcement learning on human feedback. The model generates you start with an instruction to model the model generates many completions to a given prompt. And then these prompts are put into order of best to worse by humans. These rankings are used to train a preference model that can imitate the preferences that they provide and provide, and be able to do that ranking automatically on further generations. So it's sort of, it automates this process of the human providing feedback and allows it to fine tune on a much greater scale of data.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode