From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)

Nov 11, 2025

Dive into the game-changing concept of Continuous Autoregressive Language Models, which compress tokens into vectors for speedier text generation. Discover how this innovation can slash AI costs by 44%. The discussion covers the implications of a likelihood-free training method and introduces the concept of semantic bandwidth for denser information. Also explored are the environmental benefits of more efficient models and the importance of open science versus corporate secrecy in research. Perfect for tech enthusiasts eager for the latest in AI advancements!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Pack Tokens Into Vectors

CALM compresses multiple tokens into a single continuous vector, letting models generate k tokens per step instead of one.
This reduces generation steps (e.g., k=4 cuts steps by ~4x) and can greatly speed up inference and training.

INSIGHT

Train Without Probabilities

Continuous outputs break softmax-based likelihood training, so CALM trains likelihood-free using energy scores and proper scoring rules.
They use sample-based scoring (energy/Brier approaches) to incentivize matching the true distribution without explicit probabilities.

INSIGHT

Evaluate Without Perplexity

The paper repurposes the Brier score (Breer LM) estimated from samples as an evaluation metric correlating strongly with cross-entropy.
They report a -0.966 Pearson correlation between their sample-based score and standard cross-entropy measures.

Get the Snipd Podcast app to discover more snips from this episode

Get the app