Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

00:00

Exploring AI Interpretability and Security Challenges

This chapter delves into the crucial relationship between AI interpretability and security, focusing on adversarial examples and jailbreaks. It highlights how understanding model behavior can enhance defenses against these threats, using the model CLIP as a case study to illustrate this dynamic.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app