AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Memory of Value in Iteration Networks Isn't Able to Express Literally Optimal Value
A, yes, i did. Ah, it depends a bit on your transition function. Depends a bit onYour reward function. Like, it only becomes able to express literal optimal valutation when the horizon is long enough. And also, i believe you might need multiple convolutional letters in order to represent transition function and the reward function. But i am not sure it's possible that those two things happened because the horizon, because of the horizon issue. I was very confused why my valutation network was not learning very well. So i added an additional convolational layer to the like part that represents the reward,. Then even the learned version starte working much better. T is great