AI Safety Newsletter

Center for AI Safety

Narrations of the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

This podcast also contains narrations of some of our publications.

ABOUT US

The Center for AI Safety (CAIS) is a San Francisco-based research and field-building nonprofit. We believe that artificial intelligence has the potential to profoundly benefit the world, provided that we can develop and use it safely. However, in contrast to the dramatic progress in AI, many basic problems in AI safety have yet to be solved. Our mission is to reduce societal-scale risks associated with AI by conducting safety research, building the field of AI safety researchers, and advocating for safety standards.

Learn more at https://safe.ai

Episodes

Mentioned books

Oct 31, 2023 • 12min

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks.

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.White House Executive Order on AIWhile Congress has not voted on significant AI legislation this year, the White House has left their mark on AI policy. In June, they secured voluntary commitments on safety from leading AI companies. Now, the White House has released a new executive order on AI. It addresses a wide range of issues, and specifically targets catastrophic AI risks such as cyberattacks and biological weapons. Companies must disclose large training runs. Under the executive order, companies that intend to train “dual-use foundation models” using significantly more computing power than GPT-4 must take several precautions. First, they must notify the White House before training begins. Then [...] ---Outline:(00:13) White House Executive Order on AI(03:56) Kicking Off The UK AI Safety Summit(06:18) Progress on Voluntary Evaluations of AI Risks(08:52) Links--- First published: October 31st, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-25 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Oct 18, 2023 • 13min

AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI.

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.China's New AI Law, US Export Controls, and Calls for Bilateral CooperationChina details how AI providers can fulfill their legal obligations. The Chinese government has passed several laws on AI. They’ve regulated recommendation algorithms and taken steps to mitigate the risk of deepfakes. Most recently, they issued a new law governing generative AI. It's less stringent than earlier draft version, but the law remains more comprehensive in AI regulation than any laws passed in the US, UK, or European Union. The law creates legal obligations for AI providers to respect intellectual property rights, avoid discrimination, and uphold socialist values. But as with many AI policy proposals, these are [...] ---Outline:(00:15) China's New AI Law, US Export Controls, and Calls for Bilateral Cooperation(04:58) Proposed International Institutions for AI(08:15) Open Source AI: Risks and Opportunities(11:25) Links--- First published: October 18th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-24 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Oct 4, 2023 • 10min

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering.

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.OpenAI releases GPT-4 with Vision and DALL·E-3, announces Red Teaming NetworkGPT-4 with vision and voice. When GPT-4 was initially announced in March, OpenAI demonstrated its ability to process and discuss images such as diagrams or photographs. This feature has now been integrated into GPT-4V. Users can now input images in addition to text, and the model will respond to both. Users can also speak to GPT-4V, and the model will respond verbally.GPT-4V may be more vulnerable to misuse via jailbreaks and adversarial attacks. Previous research has shown that multimodal models, which can process multiple forms of input such as both text and images, are more vulnerable to adversarial attacks than text-only models. GPT-4V's System Card includes some experiments [...] ---Outline:(00:11) OpenAI releases GPT-4 with Vision and DALL·E-3, announces Red Teaming Network(02:39) Writer's Guild of America Receives Protections Against AI Automation(03:42) Anthropic receives $1.25B investment from Amazon, and announces several new policies(06:21) Representation Engineering: A Top-Down Approach to AI Transparency(07:57) Links--- First published: October 4th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-23 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Sep 5, 2023 • 10min

AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy.

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.Google DeepMind’s GPT-4 CompetitorComputational power is a key driver of AI progress, and a new report suggests that Google’s upcoming GPT-4 competitor will be trained on unprecedented amounts of compute. The model, currently named Gemini, may be trained by the end of this year with 5x more computational power than GPT-4. By the end of next year, the report projects that Google will have the ability to train a model with 20x more compute than GPT-4. For reference, the compute difference between GPT-3 and GPT-4 was 100x. If these projections are true, Google’s new models could create a meaningful spike relative to current AI capabilities. Google’s position [...] ---Outline:(00:14) Google DeepMind’s GPT-4 Competitor(02:41) US Military Invests in Thousands of Autonomous Drones(04:37) United Kingdom Prepares for Global AI Safety Summit(06:15) Case Studies in AI Policy(08:55) Links--- First published: September 5th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-21 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 29, 2023 • 16min

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities.

AI Deception: Examples, Risks, SolutionsAI deception is the topic of a new paper from researchers at and affiliated with the Center for AI Safety. It surveys empirical examples of AI deception, then explores societal risks and potential solutions.The paper defines deception as “the systematic production of false beliefs in others as a means to accomplish some outcome other than the truth.” Importantly, this definition doesn't necessarily imply that AIs have beliefs or intentions. Instead, it focuses on patterns of behavior that regularly cause false beliefs and would be considered deceptive if exhibited by humans.Deception by Meta’s CICERO AI. Meta developed the AI system CICERO to play Diplomacy, a game where players build and betray alliances in [...] ---Outline:(00:11) AI Deception: Examples, Risks, Solutions(04:35) Proliferation of Large Language Models(09:25) Continuing Drivers of AI Capabilities(14:30) Links--- First published: August 29th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-20 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 21, 2023 • 3h 3min

[Paper] “An Overview of Catastrophic AI Risks” by Dan Hendrycks, Mantas Mazeika and Thomas Woodside

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitigate them. This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty [...] --- First published: June 21st, 2023 Source: https://arxiv.org/abs/2306.12001 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 21, 2023 • 40min

[Paper] “X-Risk Analysis for AI Research” by Dan Hendrycks and Mantas Mazeika

Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk, which consists of three parts: First, we review how systems can be made safer today, drawing on time-tested concepts from hazard analysis and systems safety that have been designed to steer large processes in safer directions. Next, we discuss strategies [...] --- First published: October 22nd, 2022 Source: https://arxiv.org/abs/2206.05862 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 21, 2023 • 53min

[Paper] “Unsolved Problems in ML Safety” by Dan Hendrycks, Nicholas Carlini, John Schulman and Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), steering ML systems (“Alignment”), and reducing deployment hazards (“Systemic Safety”). Throughout, we clarify each problem’s motivation and provide concrete research directions. --- First published: June 16th, 2022 Source: https://arxiv.org/abs/2109.13916 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 15, 2023 • 11min

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge.

US-China Competition on AI ChipsModern AI systems are trained on advanced computer chips which are designed and fabricated by only a handful of companies in the world. The US and China have been competing for access to these chips for years. Last October, the Biden administration partnered with international allies to severely limit China’s access to leading AI chips.Recently, there have been several interesting developments on AI chips. China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry. Meanwhile, the United States has struggled [...] ---Outline:(00:15) US-China Competition on AI Chips(04:09) Measuring Language Agents Developments(06:07) An Economic Analysis of Language Model Propaganda(08:11) White House Competition Applying AI to Cybersecurity(09:40) Links--- First published: August 15th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-19 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Aug 8, 2023 • 11min

AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety.

Challenges of Reinforcement Learning from Human FeedbackIf you’ve used ChatGPT, you might’ve noticed the “thumbs up” and “thumbs down” buttons next to each of its answers. Pressing these buttons provides data that OpenAI uses to improve their models through a technique called reinforcement learning from human feedback (RLHF).RLHF is popular for teaching models about human preferences, but it faces fundamental limitations. Different people have different preferences, but instead of modeling the diversity of human values, RLHF trains models to earn the approval of whoever happens to give feedback. Furthermore, as AI systems become more capable, they can learn to deceive human evaluators into giving undue approval.Here we discuss a new [...] ---Outline:(00:13) Challenges of Reinforcement Learning from Human Feedback(05:26) Microsoft’s Security Breach(06:59) Conceptual Research on AI Safety(09:25) Links--- First published: August 8th, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-18 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app