Yannic Kilcher Videos (Audio Only) cover image

ROME: Locating and Editing Factual Associations in GPT (Paper Explained & Author Interview)

Yannic Kilcher Videos (Audio Only)

00:00

Is This a Causal Tracing Problem?

The researchers used a technique called causal tracing. They recorded all of these hidden activations, then they run the model again with corrupted input. Instead of at this particular hidden state, instead of what the model gets as an input, you know, it just ignores that particular hidden state and replaces it with the one from the clean input. And now we observe, so here, maybe it said like Paris before, because something is in downtown, the model just said Paris. However, if copying over that hidden state from the clean signal actually changes the output back from Paris to Seattle,. Well, that is a fat marker. Oh, sorry about that. Those are my notes. If that actually changes

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app