
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
AXRP - the AI X-risk Research Podcast
00:00
Exploring GP4's Ability to Insert Sneaky Backdoors
Exploring the challenges and testing of GP4's ability to insert subtle backdoors without failing test cases, focusing on sneakiness and effectiveness in compromising the model.
Transcript
Play full episode