AXRP - the AI X-risk Research Podcast cover image

22 - Shard Theory with Quintin Pope

AXRP - the AI X-risk Research Podcast

00:00

The Shard Theory of Human Values

Shard theory is a counting of how simple rls learning processes can produce things that like at least look quite a bit like human values. It's intended to be an alternative perspective on what values are and how they arise which contrasts with expected utility theory for example. The reward that you train a deep system on is not its values it's like they chisel that shapes those values and their values so far as they're like actually a thing at all are very much contextually activated decision influences, the phrase we often use.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app