2min chapter

AXRP - the AI X-risk Research Podcast cover image

7 - Side Effects with Victoria Krakovna

AXRP - the AI X-risk Research Podcast

CHAPTER

Is the Reward Function Job, Instead of Job?

This is a reward that just gives your reward any time you push the tas off the conveyer belt. The first time that i made this environment, it had thi sort of, like, you know, gaining behavior of you now putting it on and off. And if the environment was more realistic than gent, would probably lig be able to push it on andoff. So this is, like, ye, this is, i think, i's generally pointing towards this reward function just not being very well designed. Ah, so here you could say that, like, preventing bad upsetting is eter. If we view the impact measure as like, specifying what not to do, or

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode