LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Effect of Steering Vectors on Weddingness

GPT-2 XL has a 1600-dimensional residual stream. We wanted to see if we could get some steering effect by only adding in certain dimensions of the residual stream, for example dimensions 0 to 799. To illustrate this, for a range of fraction values and for each of 6 prompts, we generated 100 completions. For each fraction value and prompt, we plotted the average number of wedding words per completion. All of these lines increase with the fraction of dimensions affected by the steering vector. They tend to stay fairly flat until 0.6 or so, and then increase after that.

Play episode from 54:38
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app