The Memory of Value in Iteration Networks Isn't Able to Express Literally Optimal Value

A, yes, i did. Ah, it depends a bit on your transition function. Depends a bit onYour reward function. Like, it only becomes able to express literal optimal valutation when the horizon is long enough. And also, i believe you might need multiple convolutional letters in order to represent transition function and the reward function. But i am not sure it's possible that those two things happened because the horizon, because of the horizon issue. I was very confused why my valutation network was not learning very well. So i added an additional convolational layer to the like part that represents the reward,. Then even the learned version starte working much better. T is great

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app