AXRP - the AI X-risk Research Podcast cover image

22 - Shard Theory with Quintin Pope

AXRP - the AI X-risk Research Podcast

CHAPTER

The Limits of Shard Theory in Deep Learning

Shard theory is meant to evoke the idea of like people is almost kind of reflexive. i think that for most systems you could construct out of deep learning components it's still going to be in the shard theory domain okay so i think part of this perception is that during the shard Theory sequence we very much did focus on simple examples of externally clear behaviors with a simple reward function such as juice consumption. You can have shards which activate when you're like thinking in a rude way and push your thoughts towards not being rude without like this necessarily corresponding to any externally visible sort of reward okay. These meta level cognitive processes are built on the workhorse of self-supervised slash reinforcement learn

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner