5min chapter

Towards Data Science cover image

96. Jan Leike - AI alignment at OpenAI

Towards Data Science

CHAPTER

Recursive Reward Modelling

The general idea of requestieward modelling is am. It's like we train machine learning in models to help us evaluate the task. So if it's an easy enough task, we just know how to do that task. You shand strain it and em so for example, there one of the evaluation of tasks is the task of fenox answering questions about the book,. And so what do as you like, just take that task, you train a separate model, and you like, look here. Now i'm just straining a model to get really good at answering questions about longer pieces of text. The aim with curso reward modelling was to build some kind of hypotheses on which

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode