3min chapter

AXRP - the AI X-risk Research Podcast cover image

4 - Risks from Learned Optimization with Evan Hubinger

AXRP - the AI X-risk Research Podcast

CHAPTER

Is There a Difference Between a Model's Mas Objective and a Behavior Objective?

Soviosly toy example shows difficulty in extracting what a model is doing. If you had some of y level pturbability tools, for example, than enable you to thevariable names, but not the implementation, you might be sugnificantly led astray. We don't claim that this works for all models. Lots of bottls just don't fall into the category of masop mesers. But we cond still try to geta nice mechanistic description of what that sort of model would be doing. And then also, we're not trying to come to a conclusion about what a model's objective might be only by looking its behaviour.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode