
2 - Learning Human Biases with Rohin Shah
AXRP - the AI X-risk Research Podcast
Learning Models of the Human Demonstrator
A valuiteration network is an architecture for neal networks that can be used in the eye to learn optimal policies for tabular marco decision processes. It stacks a bents of convolutional layers and then uses like, max pooling layers to mimic the maximiation in the in the valuation. Seu: When you're doing these experiments, when you're learning models of the human demonstrator, use value formation s right? A value iteration networks, or v i nit, for our audience could you sal yo o, what is v i n yes?"
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.