Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

CHAPTER

Exploring AI Interpretability and Security Challenges

This chapter delves into the crucial relationship between AI interpretability and security, focusing on adversarial examples and jailbreaks. It highlights how understanding model behavior can enhance defenses against these threats, using the model CLIP as a case study to illustrate this dynamic.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner