LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Effect of Anger Steering Vectors on the Quality of Completions

"We didn't notice completions which were angrier than unsteared GPT-2 XL. At most, adding the anger minus calm with capital letters embeddings to layer 20 has a very small effect on the qualitative anger of the completions," he says. "This is evidence that the layer 0 to 19 heads are doing a lot of work ... such that the steering vector actually increases the probability of angry completions."

Play episode from 50:51
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app