Towards a Rigid Science of Interpretability

The paper suggests three different levels of testing. The golden standard is to test with actual experts and the users, it says. It's like real humans, real tasks, real humans simplify tasks, and then no humans in proxy tasks. But you can test with bigger pool of people, maybe turkers, maybe your co co student grad students or others,. You can sort of go through this levels, but at the end to evaluate your method...

Play episode from 11:56

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app