LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Unsteered Completions

GPT-2 applies layer norm before each attention and MLP sublayer. Additions without a paired counterbalancing subtraction don't work as well. The following Steering Vector produced rather unloving completions. We could not find a speak in French vector after about an hour of effort but it's possible we missed something straightforward.

Play episode from 30:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app