Unmasking Backdoors in Machine Learning Models

This chapter explores an experiment that reveals how certain models exhibit altered behavior when influenced by backdoors in their input prompts. It discusses the challenges of reliably detecting backdoors, the impact of training data variability, and the intriguing question of models' self-awareness regarding their own backdoor features.

Play episode from 01:05:38

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app