As 2024 draws to a close, we want to thank you for your continued support for AI safety and review what we’ve been able to accomplish. In this special-edition newsletter, we highlight some of our most important projects from the year.
The mission of the Center for AI Safety is to reduce societal-scale risks from AI. We focus on three pillars of work: research, field-building, and advocacy.
Research
CAIS conducts both technical and conceptual research on AI safety. Here are some highlights from our research in 2024:
Circuit Breakers. We published breakthrough research showing how circuit breakers can prevent AI models from behaving dangerously by interrupting crime-enabling outputs. In a jailbreaking competition with a prize pool of tens of thousands of dollars, it took twenty thousand attempts to jailbreak a model trained with circuit breakers. The paper was accepted to NeurIPS 2024.
The WMDP Benchmark. We developed the Weapons [...]
---
Outline:
(00:34) Research
(04:25) Advocacy
(06:44) Field-Building
(10:38) Looking Ahead
---
First published:
December 19th, 2024
Source:
https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024
---
Want more? Check out our ML Safety Newsletter for technical safety research.
Narrated by TYPE III AUDIO.
---