AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Importance of Human Oversight in the Design of Trojans

The researchers wanted the Trojans to be human perceptible. But they found that this restricted their ability to use them in a meaningful way. They say there is some value to human oversight, even if it's just for convenience. The research could lead to new ways of detecting and recovering Trojans.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app