AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Towards a Rigid Science of Interpretability
The paper suggests three different levels of testing. The golden standard is to test with actual experts and the users, it says. It's like real humans, real tasks, real humans simplify tasks, and then no humans in proxy tasks. But you can test with bigger pool of people, maybe turkers, maybe your co co student grad students or others,. You can sort of go through this levels, but at the end to evaluate your method...