The Inside View cover image

3. Evan Hubinger on Takeoff speeds, Risks from learned optimization & Interpretability

The Inside View

00:00

Iithink, U Visualized Features Ind Coing Ron Anda

The point of distill is to try the gat like, you know, intetability research more like, get more attet and more pestitious. So if you're forcing your models to be interpretable, it's a good analogy would be forcing your students to show that hatthey've done good work. Am, so they are not good harding the actual onization, but they like showing everything. If ef thertrans ent, explicitly transparent,. It is harder for them to lie. U visualized features ind coing ron anda, the reward. Pcolif i think the micro ithink microscope from openy eye wher di, you see

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app