Learning Models of the Human Demonstrator

A valuiteration network is an architecture for neal networks that can be used in the eye to learn optimal policies for tabular marco decision processes. It stacks a bents of convolutional layers and then uses like, max pooling layers to mimic the maximiation in the in the valuation. Seu: When you're doing these experiments, when you're learning models of the human demonstrator, use value formation s right? A value iteration networks, or v i nit, for our audience could you sal yo o, what is v i n yes?"

Play episode from 21:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app