LessWrong (Curated & Popular) cover image

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

LessWrong (Curated & Popular)

00:00

Exploring Neuron Ablation and Shapley Score Optimization for Network Robustness and Alignment

Exploring methods such as neuron ablation and shapley score optimization in addressing biases and vulnerabilities in neural networks, alongside discussions on network robustness, adversarial attacks, safety, and alignment with generative models.

Play episode from 01:10:32
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app