The Future of Interpretability Research

Robinson: Lots of interpretability research has been done at relatively small scales with humans in the loop. But there is a gap between this and like what we would really need to fix AI, he says. "I think it's probably going to be very, very important and central to highly relevant forms of interpretability"

Play episode from 24:29

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app