AXRP - the AI X-risk Research Podcast cover image

2 - Learning Human Biases with Rohin Shah

AXRP - the AI X-risk Research Podcast

00:00

The Memory of Value in Iteration Networks Isn't Able to Express Literally Optimal Value

A, yes, i did. Ah, it depends a bit on your transition function. Depends a bit onYour reward function. Like, it only becomes able to express literal optimal valutation when the horizon is long enough. And also, i believe you might need multiple convolutional letters in order to represent transition function and the reward function. But i am not sure it's possible that those two things happened because the horizon, because of the horizon issue. I was very confused why my valutation network was not learning very well. So i added an additional convolational layer to the like part that represents the reward,. Then even the learned version starte working much better. T is great

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner