
22 - Shard Theory with Quintin Pope
AXRP - the AI X-risk Research Podcast
00:00
The Shard Theory of Human Values
Shard theory is a counting of how simple rls learning processes can produce things that like at least look quite a bit like human values. It's intended to be an alternative perspective on what values are and how they arise which contrasts with expected utility theory for example. The reward that you train a deep system on is not its values it's like they chisel that shapes those values and their values so far as they're like actually a thing at all are very much contextually activated decision influences, the phrase we often use.
Transcript
Play full episode