Yannic Kilcher Videos (Audio Only) cover image

ROME: Locating and Editing Factual Associations in GPT (Paper Explained & Author Interview)

Yannic Kilcher Videos (Audio Only)

00:00

Scaling a Distributed Network in GPT-2 XL

In GPT-2 XL, we use layer 17, right? And we find that the causal effects peak there. If you're trying to insert lots of facts and maybe trying to pile them all into the same matrix might not scale that well. But if we just picked the single layer that's most effective, then it works for all these facts. We end up in a situation where we have this distributed network distributed representation.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app