LessWrong (Curated & Popular) cover image

"SolidGoldMagikarp (plus, prompt generation)"

LessWrong (Curated & Popular)

00:00

The GPTJ Model - The 50 Closest to Centroid Tokens

GPT token embeddings are normalized to norm 1, which is just blatantly untrue. But a revised hypothesis that many of these tokens we were seeing were among those closest to the centroid of the entire set of 50,257 tokens turned out to be correct. Here are the 50 closest to centroid tokens for the GPTJ model. Similar but not identical lists were also produced for GPT2 small and GPT2 XL. All of this data will be included in a follow up post.

Play episode from 12:59
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app