LessWrong (Curated & Popular) cover image

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

LessWrong (Curated & Popular)

00:00

The Importance of Activation Editions in Language Models

Alex thinks we really should be able to control which goal the network decides to pursue at inference time without fully mechanistically understanding the relevant circuitry. Activation editions are way cheaper than fine-tuning, both in terms of effort and compute. Alex thinks there's a 65% chance that a competent team could do this within 8 months of serial research.

Play episode from 01:35:45
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app