LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Effect of Steering Vectors on the Output of Weddings

The steering vector is usually shorter than the tokenized prompt. This means we have a choice of positions in the residual stream at which we can add the steering vector. The authors write, the front and middle additions led to coherent outputs but the back addition didn't. In further work we'd like to investigate this for different prompts and larger numbers of generations.

Play episode from 34:25
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app