Inference

So it seems like maybe during inference, you, let's say you radnally sample ten reward functions to get their relative likelihoods. And the the rard functions have different like, constante added, like the reward of every single state. If i, like, take the expectation over those, then it's sort of like taking the expectation if all the constants were zero, and then adding, like, the expectation of the consttants right? Because expectations are linear. So wouldn't that not affect how you choose between different actions? In theory, no, and with enough samples,. No, because those averages cancell out even with ten sample, even with taso. Let

Play episode from 01:41:27

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app