
2 - Learning Human Biases with Rohin Shah
AXRP - the AI X-risk Research Podcast
00:00
The Memory of Value in Iteration Networks Isn't Able to Express Literally Optimal Value
A, yes, i did. Ah, it depends a bit on your transition function. Depends a bit onYour reward function. Like, it only becomes able to express literal optimal valutation when the horizon is long enough. And also, i believe you might need multiple convolutional letters in order to represent transition function and the reward function. But i am not sure it's possible that those two things happened because the horizon, because of the horizon issue. I was very confused why my valutation network was not learning very well. So i added an additional convolational layer to the like part that represents the reward,. Then even the learned version starte working much better. T is great
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.