AXRP - the AI X-risk Research Podcast cover image

22 - Shard Theory with Quintin Pope

AXRP - the AI X-risk Research Podcast

00:00

The Limits of Shard Theory in Deep Learning

Shard theory is meant to evoke the idea of like people is almost kind of reflexive. i think that for most systems you could construct out of deep learning components it's still going to be in the shard theory domain okay so i think part of this perception is that during the shard Theory sequence we very much did focus on simple examples of externally clear behaviors with a simple reward function such as juice consumption. You can have shards which activate when you're like thinking in a rude way and push your thoughts towards not being rude without like this necessarily corresponding to any externally visible sort of reward okay. These meta level cognitive processes are built on the workhorse of self-supervised slash reinforcement learn

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app