AXRP - the AI X-risk Research Podcast cover image

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

AXRP - the AI X-risk Research Podcast

00:00

Exploring GP4's Ability to Insert Sneaky Backdoors

Exploring the challenges and testing of GP4's ability to insert subtle backdoors without failing test cases, focusing on sneakiness and effectiveness in compromising the model.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app