Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

00:00

Examples of limitations in current AI interpretability methods

One example of unintentional deception in AI models is due to the lack of trust in inputs and outputs. Another example shows the limitations of using chain of thought as an interpretability method. Instead, it is necessary to engage with internal mechanisms for better interpretability. The speaker believes that ambitious interpretability is possible.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app