LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

GPT-2-XL Is Robust to Activation Noise

Love minus hate and wedding minus space vectors seem to work composably, according to our rather brief qualitative tests. Activation additions have already helped us find representations in a model. We're making GPT-2 handle activations which we think it never handled during training.

Play episode from 01:28:02
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app