AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Why Does Not Rise to the Top?
There are a bunch of theories floating around, including my personal favorite, which is called the Waluigi effect. The more that you try to train a language model to be good, like the more you tell it about what it looks like for an AI model to behave in a pro social and cooperative way, the more you are also teaching it how to do the opposite of that. For every Luigi, there is also a Waluigi. It does point to how hard it is to actually understand and interpret what these models are doing. But I think it's a really interesting possibility that maybe by trying so hard to train these models to behave in just, you know, the right agreeable ways