The Emergence of Mechanistic Interpretability

The guiding thread is for me and what excites me moving forward is how can we have again building inspiration taking inspiration from working comparative and developmental psychology. I'm very excited in particular by this kind of emerging field of research called mechanistic interpretability that involves going beyond behavioral behavior or investing or before behavioral evaluation of the models using benchmarks. It actually tries to pry open these models and reverse engineer the circuits that emerge from training within these models but you cannot a priori know just by looking at the architecture of the model or its learning objective.

Play episode from 02:00:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app