LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

Sentences About Shipping Aren't Changed

The layer 16 coefficient positive 1 wedding vector affects perplexity on a sentence by sentence basis. The tokens with large probability increases include the expected wedding, but also couples, celebrations and other semantically associated tokens. For layer 16 injections of weddings, coefficients larger than positive 3 start degrading capabilities. Some of our demonstrations probably did degrade capabilities. Subheading, activation edition behaves differently than prompting.

Play episode from 01:14:31
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app