Anthropic’s superposition paper claims that models are under-parameterized

The superposition hypothesis in Anthropic's paper suggests that models are under-parameterized when faced with high-dimensional and sparse data, such as modeling internet data. This leads to a compression strategy where the model can pack more features of the world into it than it has parameters. Superposition arises in such regimes to handle the sparse and high-dimensional nature of the data. The difficulty in interpreting neural networks is attributed to this superposition effect, where neurons contribute to the model's output in a confusing and mixed manner. By projecting activations into a higher dimensional space and applying a sparsity penalty, the compression can be undone, resulting in cleaner and more interpretable features. Contrary to popular belief, the claim in the paper is that deep learning models are dramatically under parameterized, given the complexity of the tasks they are designed to handle.

Play episode from 01:06:53

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app