LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Effects of Steering Modification on Coherent Sentences

A model's perplexity for a sentence is its average per-token surprise-all. If we're harming capabilities by steering GPT-2, then the steered model probably has higher perplexity on coherent sentences. What we want to find is the steering modification boosting probability on wedding sentences and not reducing the probability of non-wedding sentences. That's exactly what we found.

Play episode from 01:06:55
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app