
Data Science at Home From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)
Nov 11, 2025
Dive into the game-changing concept of Continuous Autoregressive Language Models, which compress tokens into vectors for speedier text generation. Discover how this innovation can slash AI costs by 44%. The discussion covers the implications of a likelihood-free training method and introduces the concept of semantic bandwidth for denser information. Also explored are the environmental benefits of more efficient models and the importance of open science versus corporate secrecy in research. Perfect for tech enthusiasts eager for the latest in AI advancements!
AI Snips
Chapters
Transcript
Episode notes
Pack Tokens Into Vectors
- CALM compresses multiple tokens into a single continuous vector, letting models generate k tokens per step instead of one.
- This reduces generation steps (e.g., k=4 cuts steps by ~4x) and can greatly speed up inference and training.
Train Without Probabilities
- Continuous outputs break softmax-based likelihood training, so CALM trains likelihood-free using energy scores and proper scoring rules.
- They use sample-based scoring (energy/Brier approaches) to incentivize matching the true distribution without explicit probabilities.
Evaluate Without Perplexity
- The paper repurposes the Brier score (Breer LM) estimated from samples as an evaluation metric correlating strongly with cross-entropy.
- They report a -0.966 Pearson correlation between their sample-based score and standard cross-entropy measures.
