AXRP - the AI X-risk Research Podcast cover image

2 - Learning Human Biases with Rohin Shah

AXRP - the AI X-risk Research Podcast

CHAPTER

Learning Models of the Human Demonstrator

A valuiteration network is an architecture for neal networks that can be used in the eye to learn optimal policies for tabular marco decision processes. It stacks a bents of convolutional layers and then uses like, max pooling layers to mimic the maximiation in the in the valuation. Seu: When you're doing these experiments, when you're learning models of the human demonstrator, use value formation s right? A value iteration networks, or v i nit, for our audience could you sal yo o, what is v i n yes?"

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner