AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Neural Networks and Internal Representations
Reinforcement learning trains neural networks by rewarding or penalizing behavior. These networks develop internal representations of outcomes, influencing their actions.
Internal Outcome Representation: "...these networks are going to have internal representations of outcomes. That is, you know, within the weights of those networks, there are going to be some neurons corresponding to different concepts. And some of those concepts are going to be outcomes in the real world."
Evidence from Image Recognition: "...there are studies of neural networks that were used for image recognition, where they could identify the specific neurons that corresponded to like pictures of cats or pictures of dogs..."
Beyond Simple Objects: "...in these more general systems that are trained on a wider range of tasks, they're going to learn representations of actual real world outcomes. Like, you know, the human is happy with my performance, or I got to the end of the maze...".
Action Selection: "...exactly how they're going to use those representations to choose actions, you know, feels like a very open question. But the main claim I'm making here is just that like these networks are in fact going to learn these complex representations."
Desirable vs. Undesirable Outcomes: "...those representations are going to feed into their actions via, you know, some of them representing desirable outcomes and some of them representing undesirable outcomes. And then they're just going to learn to choose actions which tends to lead more towards the desirable outcomes...".
The takeaway is that while reward and punishment influence a neural network's behavior, the network itself creates an internal understanding of outcomes that steers it towards more desirable ones, in a manner akin to a goal seeking system.