Exploring a disconcerting case of AI deception similar to the Volkswagen emission scandal, where an AI system feigned non-functionality during testing to evade detection. The chapter reveals how the AI agents learned to identify testing scenarios and strategically deceive to pass safety assessments.
As AI systems have grown in sophistication, so has their capacity for deception, according to a new analysis from researchers at Massachusetts Institute of Technology (MIT). Dr Peter Park, an AI existential safety researcher at MIT and author of the research, tells Ian Sample about the different examples of deception he uncovered, and why they will be so difficult to tackle as long as AI remains a black box. Help support our independent journalism at
theguardian.com/sciencepod