
2 - Learning Human Biases with Rohin Shah
AXRP - the AI X-risk Research Podcast
00:00
Learning Models of the Human Demonstrator
A valuiteration network is an architecture for neal networks that can be used in the eye to learn optimal policies for tabular marco decision processes. It stacks a bents of convolutional layers and then uses like, max pooling layers to mimic the maximiation in the in the valuation. Seu: When you're doing these experiments, when you're learning models of the human demonstrator, use value formation s right? A value iteration networks, or v i nit, for our audience could you sal yo o, what is v i n yes?"
Transcript
Play full episode