AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Indirect Identification
In this indirect identification paper they found this interesting phenomena of there were name move ahead which attended to the correct answer and negative name move aheads which I think attended to also the correct name but suppressed it. When you ablated the name moving head some of the negative name moves kind of acted as backups and significantly reduced to that negative behavior. My guess is that that was a result of dropout which GPT-2 was trained with.