LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Worst Vector Improves Perplexity on Negative Sentiment Reviews

After layer 4, perplexity decreases on all of the input texts, regardless of sentiment. In other words, this injection prompt makes all the restaurant review results more likely. The worst vector is effective because it increases the relative probability of negative sentiment inputs. It has a large effect on unrelated texts, that is neutral and positive review sentences.

Play episode from 01:21:55
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app