LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

Anger Minus Calm in Lower Case Doesn't Work at All

For er and m, it rapidly shoots down to about 0.6, then increases crossing over at around 10 layers to become higher than the previous line. The authors write this is evidence that low norm can't explain why anger minus calm with all lower case doesn't work. Here's another graph again its layer number versus magnitude with layer number on the x axis. It's got the end of text line again stuck on zero the whole time. And anger minus calm all in lower case in position one starts just below 1.2 shoots up to just above 1.4 Then rapidly decreases again down to just below one. By the time it gets to 10 layers it's down to 0.8

Play episode from 43:42
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app