
What’s up with LLMs representing XORs of arbitrary features?
LessWrong (Curated & Popular)
00:00
Generating Representations for Classification and Linear Features
This chapter explores the process of generating a representation where a certain direction is useful for classification, even if it's not deliberately calculated. The speaker shares their experiments on a reset version of llama2-13b, shuffling the weights of each parameter, and delves into the linear representation of token level features and the challenge of selecting the true versus false task using basic heuristics.
Play episode from 21:52
Transcript


