AI Safety Newsletter

Center for AI Safety
undefined
Aug 1, 2023 • 16min

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight.

Automatically Circumventing LLM GuardrailsLarge language models (LLMs) can generate hazardous information, such as step-by-step instructions on how to create a pandemic pathogen. To combat the risk of malicious use, companies typically build safety guardrails intended to prevent LLMs from misbehaving. But these safety controls are almost useless against a new attack developed by researchers at Carnegie Mellon University and the Center for AI Safety. By studying the vulnerabilities in open source models such as Meta’s LLaMA 2, the researchers can automatically generate a nearly unlimited supply of “adversarial suffixes,” which are words and characters that cause any model’s safety controls to fail. This discovery calls into question the fundamental limits of safety [...] ---Outline:(00:12) Automatically Circumventing LLM Guardrails(05:40) AI Labs Announce the Frontier Model Forum(07:54) Senate Hearing on AI Oversight(14:42) Links--- First published: August 1st, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-17 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jul 25, 2023 • 12min

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs, and Lessons from Oppenheimer .

White House Unveils Voluntary Commitments to AI Safety from Leading AI LabsLast Friday, the White House announced a series of voluntary commitments from seven of the world's premier AI labs. Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI pledged to uphold these commitments, which are non-binding and pertain only to forthcoming "frontier models" superior to currently available AI systems. The White House also notes that the Biden-Harris Administration is developing an executive order alongside these voluntary commitments.The commitments are timely and technically well-informed, demonstrating the ability of federal policymakers to respond capably and quickly to AI risks. The Center for AI Safety supports these commitments as a precedent for cooperation on AI [...] ---Outline:(00:11) White House Unveils Voluntary Commitments to AI Safety from Leading AI Labs(05:05) Lessons from Oppenheimer(10:38) Links--- First published: July 25th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-16 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jul 19, 2023 • 12min

AISN #15: China and the US take action to regulate AI, results from a tournament forecasting AI risk, updates on xAI’s plan, and Meta releases its open-source and commercially available Llama 2.

Both China and the US take action to regulate AILast week, regulators in both China and the US took aim at generative AI services. These actions show that China and the US are both concerned with AI safety. Hopefully, this is a sign they can eventually coordinate.China’s new generative AI rulesOn Thursday, China’s government released new rules governing generative AI. China’s new rules, which are set to take effect on August 15th, regulate publicly-available generative AI services. The providers of such services will be criminally liable for the content their services generate. The rules specify illegal [...] ---Outline:(00:17) Both China and the US take action to regulate AI(00:36) China’s new generative AI rules(03:15) The FTC investigates OpenAI(05:01) Results from a tournament forecasting AI risk(08:18) Updates on xAI’s plan(09:05) Meta releases Llama 2, open-source and commercially available--- First published: July 19th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-15 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jul 12, 2023 • 9min

AISN #14: OpenAI’s ‘Superalignment’ team, Musk’s xAI launches, and developments in military AI use .

OpenAI announces a ‘superalignment’ teamOn July 5th, OpenAI announced the ‘Superalignment’ team: a new research team given the goal of aligning superintelligence, and armed with 20% of OpenAI’s compute. In this story, we’ll explain and discuss the team’s strategy.What is superintelligence? In their announcement, OpenAI distinguishes between ‘artificial general intelligence’ and ‘superintelligence.’ Briefly, ‘artificial general intelligence’ (AGI) is about breadth of performance. Generally intelligent systems perform well on a wide range of cognitive tasks. For example, humans are in many senses generally intelligent: we can learn how to drive a car, take a derivative, or play piano, even though evolution didn’t train us for those tasks. A superintelligent system would not only be [...] ---Outline:(00:11) OpenAI announces a ‘superalignment’ team(03:50) Musk launches xAI(05:12) Developments in Military AI Use--- First published: July 12th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-14 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jul 5, 2023 • 18min

AISN #13: An interdisciplinary perspective on AI proxy failures, new competitors to ChatGPT, and prompting language models to misbehave.

Interdisciplinary Perspective on AI Proxy FailuresIn this story, we discuss a recent paper on why proxy goals fail. First, we introduce proxy gaming, and then summarize the paper’s findings. Proxy gaming is a well-documented failure mode in AI safety. For example, social media platforms use AI systems to recommend content to users. These systems are sometimes built to maximize the amount of time a user spends on the platform. The idea is that the time the user spends on the platform approximates the quality of the content being recommended. However, a user might spend even more time on a platform because they’re responding to an enraging post or interacting [...] ---Outline:(00:13) Interdisciplinary Perspective on AI Proxy Failures(06:06) A Flurry of AI Fundraising and Model Releases(12:53) Adversarial Inputs Make Chatbots Misbehave(15:52) Links--- First published: July 5th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-13 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jun 27, 2023 • 14min

AISN #12: Policy Proposals from NTIA’s Request for Comment, and Reconsidering Instrumental Convergence.

Policy Proposals from NTIA’s Request for CommentThe National Telecommunications and Information Administration publicly requested comments on the matter from academics, think tanks, industry leaders, and concerned citizens. They asked 34 questions and received more than 1,400 responses on how to govern AI for the public benefit. This week, we cover some of the most promising proposals found in the NTIA submissions. Technical Proposals for Evaluating AI SafetySeveral NTIA submissions focused on the technical question of how to evaluate the safety of an AI system. We review two areas of active research: red-teaming and transparency. Red Teaming: Acting like an AdversarySeveral submissions proposed government support for evaluating AIs via red teaming. In this evaluation method, a [...] ---Outline:(00:11) Policy Proposals from NTIA’s Request for Comment(00:48) Technical Proposals for Evaluating AI Safety(01:04) Red Teaming: Acting like an Adversary(02:24) Transparency: Understanding AIs From the Inside(03:51) Governance Proposals for Improving Safety Processes(04:25) Requiring a License for Frontier AI Systems(06:29) Unifying Sector-Specific Expertise and General AI Oversight(07:51) Does Antitrust Prevent Cooperation Between AI Labs?(08:40) Reconsidering Instrumental Convergence(10:39) Links--- First published: June 27th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-12 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jun 22, 2023 • 11min

AISN #11: An Overview of Catastrophic AI Risks.

An Overview of Catastrophic AI RisksGlobal leaders are concerned that artificial intelligence could pose catastrophic risks. 42% of CEOs polled at the Yale CEO Summit agree that AI could destroy humanity in five to ten years. The Secretary General of the United Nations said we “must take these warnings seriously.” Amid all these frightening polls and public statements, there’s a simple question that’s worth asking: why exactly is AI such a risk?The Center for AI Safety has released a new paper to provide a clear and comprehensive answer to this question. We detail the precise risks posed by AI, the structural dynamics making these problems so difficult to solve, and the technical, social, and political responses required to overcome this [...] ---Outline:(00:08) An Overview of Catastrophic AI Risks(00:56) Malicious actors can use AIs to cause harm.(02:18) Racing towards an AI disaster.(04:05) Safety should be a goal, not a constraint.(05:46) The challenge of AI control.(07:53) Positive visions for the future of AI.(09:02) Links--- First published: June 22nd, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-11 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jun 13, 2023 • 7min

AISN #10: How AI could enable bioterrorism, and policymakers continue to focus on AI .

How AI could enable bioterrorismOnly a hundred years ago, no person could have single handedly destroyed humanity. Nuclear weapons changed this situation, giving the power of global annihilation to a small handful of nations with powerful militaries. Now, thanks to advances in biotechnology and AI, a much larger group of people could have the power to create a global catastrophe. This is the upshot of a new paper from MIT titled “Can large language models democratize access to dual-use biotechnology?” The authors demonstrate that today’s language models are capable of providing detailed instructions for non-expert users about how to create pathogens that could cause a global pandemic.Language models can help users build dangerous [...] ---Outline:(00:10) How AI could enable bioterrorism(03:48) Policymakers continue to focus on AI(05:27) Links--- First published: June 13th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-10 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
Jun 6, 2023 • 15min

AISN #9: Statement on Extinction Risks, Competitive Pressures, and When Will AI Reach Human-Level? .

Top Scientists Warn of Extinction Risks from AILast week, hundreds of AI scientists and notable public figures signed a public statement on AI risks written by the Center for AI Safety. The statement reads:“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”The statement was signed by a broad, diverse coalition. The statement represents a historic coalition of AI experts — along with philosophers, ethicists, legal scholars, economists, physicists, political scientists, pandemic scientists, nuclear scientists, and climate scientists — establishing the risk of extinction from advanced, future AI systems as one of the world’s most important problems. The international community is [...] ---Outline:(00:10) Top Scientists Warn of Extinction Risks from AI(03:35) Competitive Pressures in AI Development(07:22) When Will AI Reach Human Level?(12:47) Links--- First published: June 6th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-9 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.
undefined
May 30, 2023 • 12min

AISN #8: Why AI could go rogue, how to screen for AI risks, and grants for research on democratic governance of AI.

Yoshua Bengio makes the case for rogue AIAI systems pose a variety of different risks. Renowned AI scientist Yoshua Bengio recently argued for one particularly concerning possibility: that advanced AI agents could pursue goals in conflict with human values. Human intelligence has accomplished impressive feats, from flying to the moon to building nuclear weapons. But Bengio argues that across a range of important intellectual, economic, and social activities, human intelligence could be matched and even surpassed by AI. How would advanced AIs change our world? Many technologies are tools, such as toasters and calculators, which humans use to accomplish our goals. AIs are different, Bengio says. [...] ---Outline:(00:11) Yoshua Bengio makes the case for rogue AI(05:11) How to screen AIs for extreme risks(09:12) Funding for Work on Democratic Inputs to AI(10:43) Links--- First published: May 30th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-8 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app