AXRP - the AI X-risk Research Podcast cover image

2 - Learning Human Biases with Rohin Shah

AXRP - the AI X-risk Research Podcast

00:00

Learning Models of the Human Demonstrator

A valuiteration network is an architecture for neal networks that can be used in the eye to learn optimal policies for tabular marco decision processes. It stacks a bents of convolutional layers and then uses like, max pooling layers to mimic the maximiation in the in the valuation. Seu: When you're doing these experiments, when you're learning models of the human demonstrator, use value formation s right? A value iteration networks, or v i nit, for our audience could you sal yo o, what is v i n yes?"

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app