Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Machine Learning Street Talk (MLST)

00:00

Challenges of Sparse Autoencoders

This chapter addresses the complexities of modifying machine learning models, focusing on the performance issues of 'steered' models. It explores the architecture of sparse autoencoders, emphasizing the significance of sparsity in feature representation and the challenges of disentangling entangled data. The discussion also delves into the interpretability of latent variables and the ongoing research needed to understand how these models can reveal hidden features in datasets.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app