Neural Networks and Internal Representations | 3min snip from 80,000 Hours Podcast

#141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well

80,000 Hours Podcast

INSIGHT

Neural Networks and Internal Representations

Reinforcement learning trains neural networks by rewarding or penalizing behavior. These networks develop internal representations of outcomes, influencing their actions.

Internal Outcome Representation: "...these networks are going to have internal representations of outcomes. That is, you know, within the weights of those networks, there are going to be some neurons corresponding to different concepts. And some of those concepts are going to be outcomes in the real world."

Evidence from Image Recognition: "...there are studies of neural networks that were used for image recognition, where they could identify the specific neurons that corresponded to like pictures of cats or pictures of dogs..."

Beyond Simple Objects: "...in these more general systems that are trained on a wider range of tasks, they're going to learn representations of actual real world outcomes. Like, you know, the human is happy with my performance, or I got to the end of the maze...".

Action Selection: "...exactly how they're going to use those representations to choose actions, you know, feels like a very open question. But the main claim I'm making here is just that like these networks are in fact going to learn these complex representations."

Desirable vs. Undesirable Outcomes: "...those representations are going to feed into their actions via, you know, some of them representing desirable outcomes and some of them representing undesirable outcomes. And then they're just going to learn to choose actions which tends to lead more towards the desirable outcomes...".

The takeaway is that while reward and punishment influence a neural network's behavior, the network itself creates an internal understanding of outcomes that steers it towards more desirable ones, in a manner akin to a goal seeking system.

00:00

Transcript

Play full episode

Transcript

Episode notes

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.