LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

GPT2's Byte Pairing Coding Tokenizer

We start running an ordinary GPT2 XL forward pass on the prompt I love dogs until layer six. Right before layer six begins, we now add in the caged residual stream vectors from before. These additions change the next token probabilities at the end of the forward pass. We can also wait vector additions by a coefficient instead of adding in negative ten and positive thirty six to stream zero and stream one,. In the above example then our coefficient was positive five. Even subtracting several times will help preserve model capabilities.

Play episode from 05:38
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app