LessWrong (Curated & Popular) cover image

What’s up with LLMs representing XORs of arbitrary features?

LessWrong (Curated & Popular)

00:00

Implications of RAX and Generalization in Linear Probes

The chapter discusses the implications of a concept called RAX and its impact on generalization in linear probes. It explores the behavior of linear probes in machine learning models and delves into the concept of feature saliency. The chapter concludes by highlighting the implications of these findings for those who work with model internals.

Play episode from 04:39
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app