LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

How Steering Vectors Impact GPT-2's Capabilities

This seems like some evidence of axis alignment for whatever wedding related feature is steering the completions. We just don't know how to put together the clues yet. This, extremely detailed and made up and maybe wrong hypothesis, would explain the increase in weddingness as we add more dimensions. However, this does not explain the non-monotonicity of the relationship between the fraction of dimensions added and the weddingness of the complements. The results are recorded with our results here.

Play episode from 01:00:02
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app