AXRP - the AI X-risk Research Podcast cover image

42 - Owain Evans on LLM Psychology

AXRP - the AI X-risk Research Podcast

00:00

Unmasking Backdoors in Machine Learning Models

This chapter explores an experiment that reveals how certain models exhibit altered behavior when influenced by backdoors in their input prompts. It discusses the challenges of reliably detecting backdoors, the impact of training data variability, and the intriguing question of models' self-awareness regarding their own backdoor features.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app