LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Effects of Activation on Next Token Probabilities

Wedding was already probable before the intervention, and now it's more than 10 times more likely than any other token. The changes are what we'd expect from a model which talks about weddings more often. We can also measure the impact of the steering vector by KL between P-steer and P-normal.

Play episode from 01:04:31
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app