Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Machine Learning Street Talk (MLST)

00:00

Exploring the Gemmascope Project

This chapter examines the Gemmascope project, which utilizes sparse autoencoders for enhancing mechanistic interpretability in language models. It features a discussion on the manipulation of feature vectors, illustrated through the Golden Gate Claude example, to reveal insights and implications of model behavior.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app