LessWrong (Curated & Popular) cover image

"SolidGoldMagikarp (plus, prompt generation)"

LessWrong (Curated & Popular)

00:00

Clustering Tokens in Embedding Space

We were interested in the semantic relevance of the clusters produced by the K-means algorithm. We looked for the nearest legal token embedding to the centroid of each cluster. Over many runs, we kept seeing the same handful of tokens playing this role. There were what appeared to be some special characters but also long unfamiliar strings like Nitrome fan and solid gold magic harp. The puzzling tokens seem to have a tendency to aggregate together into a few clusters of their own.

Play episode from 11:13
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app