LessWrong (Curated & Popular)

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

Nov 15, 2024
The discussion dives into the significant threat of catastrophic sabotage in the context of human-level AI. It examines two chilling scenarios: sabotage of AI alignment research and attacks on critical actors. The speakers evaluate the necessary capabilities for carrying out such sabotage and explore methods for assessing risks. To combat these threats, they propose strategies for mitigation, including internal usage restrictions and affirmative safety cases. It’s a compelling look at the darker implications of AI development.
Ask episode
Chapters
Transcript
Episode notes