
3 - Negotiable Reinforcement Learning with Andrew Critch
AXRP - the AI X-risk Research Podcast
Theorem of the Stock Market
In this theorem, you sort of assume that im palsies can be staccastic. E think it's slightly different than what readers might think. At every am, time step you can ranamize what you're going to do. The a i system is like flipping a coin and its palisy is just what the weight of that coin is. And it can also randamize at outset if it wants. It can choose a random seed at the beginning. Iought to choose between two different random policies. So it has a memory in sense. You could just a few diferet ways of formalizing it. Or you could just have that a it that i can remember
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.