
8 - Assistance Games with Dylan Hadfield-Menell
AXRP - the AI X-risk Research Podcast
00:00
Inference
So it seems like maybe during inference, you, let's say you radnally sample ten reward functions to get their relative likelihoods. And the the rard functions have different like, constante added, like the reward of every single state. If i, like, take the expectation over those, then it's sort of like taking the expectation if all the constants were zero, and then adding, like, the expectation of the consttants right? Because expectations are linear. So wouldn't that not affect how you choose between different actions? In theory, no, and with enough samples,. No, because those averages cancell out even with ten sample, even with taso. Let
Play episode from 01:41:27
Transcript


