AI Safety Fundamentals

BlueDot Impact

Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/

Episodes

Mentioned books

Sep 3, 2025 • 10min

The Gentle Singularity

By Sam AltmanThis blog post offers a vivid, optimistic vision of rapid AI progress from the CEO of OpenAI. Altman suggests that the accelerating technological change will feel "impressive but manageable," and that there are serious challenges to confront/Source: https://blog.samaltman.com/the-gentle-singularityA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Sep 3, 2025 • 13min

Solarpunk: A Vision for a Sustainable Future

By Joshua KrookWhat might sustainable human progress look like, beyond pure technological acceleration? This essay provides an alternative vision, based on communities living in greater harmony with each other and with nature, alongside advanced technologies.Source:https://newintrigue.com/2025/01/29/solarpunk-a-vision-for-a-sustainable-future/A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Sep 3, 2025 • 17min

In Search of a Dynamist Vision for Safe Superhuman AI

By Helen TonerThis essay describes AI safety policies that rely on centralised control (surveillance, fewer AI projects, licensing regimes) as "stasist" approaches that sacrifice innovation for stability. Toner argues we need "dynamist" solutions to the risks from AI that allow for decentralised experimentation, creativity and risk-taking.Source:https://helentoner.substack.com/p/dynamism-vs-stasisA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Sep 3, 2025 • 17min

It’s Practically Impossible to Run a Big AI Company Ethically

By Sigal Samuel (Vox Future Perfect)Even "safety-first" AI companies like Anthropic face market pressure that can override ethical commitments. This article demonstrates the constraints facing AI companies, and why voluntary corporate governance can't solve coordination problems alone.Source:https://archive.ph/A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Sep 3, 2025 • 18min

Seeking Stability in the Competition for AI Advantage

By Iskander Rehman, Karl P. Mueller, Michael J. MazarrThis RAND article describes some of the international dynamics driving the race to AGI between the US and China, and analyses whether nuclear deterrence logic applies to this race.Source: https://www.rand.org/pubs/commentary/2025/03/seeking-stability-in-the-competition-for-ai-advantage.htmlA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Aug 30, 2025 • 2h 10min

AI-Enabled Coups: How a Small Group Could Use AI to Seize Power

By Tom Davidson, Lukas Finnveden and Rose Hadshar. The development of AI that is more broadly capable than humans will create a new and serious threat: AI-enabled coups. An AI-enabled coup could be staged by a very small group, or just a single person, and could occur even in established democracies. Sufficiently advanced AI will introduce three novel dynamics that significantly increase coup risk. Firstly, military and government leaders could fully replace human personnel with AI systems that are singularly loyal to them, eliminating the need to gain human supporters for a coup. Secondly, leaders of AI projects could deliberately build AI systems that are secretly loyal to them, for example fully autonomous military robots that pass security tests but later execute a coup when deployed in military settings. Thirdly, senior officials within AI projects or the government could gain exclusive access to superhuman capabilities in weapons development, strategic planning, persuasion, and cyber offence, and use these to increase their power until they can stage a coup. To address these risks, AI projects should design and enforce rules against AI misuse, audit systems for secret loyalties, and share frontier AI systems with multiple stakeholders. Governments should establish principles for government use of advanced AI, increase oversight of frontier AI projects, and procure AI for critical systems from multiple independent providers.Source: https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-powerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 12min

Logical Induction (Blog Post)

MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, myself, and Jessica Taylor. Readers may wish to start with the abridged version. Consider a setting where a reasoner is observing a deductive process (such as a community of mathematicians and computer programmers) and waiting for proofs of various logical claims (such as the abc conjecture, or “this computer program has a bug in it”), while making guesses about which claims will turn out to be true. Roughly speaking, our paper presents a computable (though inefficient) algorithm that outpaces deduction, assigning high subjective probabilities to provable conjectures and low probabilities to disprovable conjectures long before the proofs can be produced. This algorithm has a large number of nice theoretical properties. Still speaking roughly, the algorithm learns to assign probabilities to sentences in ways that respect any logical or statistical pattern that can be described in polynomial time. Additionally, it learns to reason well about its own beliefs and trust its future beliefs while avoiding paradox. Quoting from the abstract: "These properties and many others all follow from a single logical induction criterion, which is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence φ is associated with a stock that is worth $1 per share if φ is true and nothing otherwise, and we interpret the belief-state of a logically uncertain reasoner as a set of market prices, where ℙn(φ)=50% means that on day n, shares of φ may be bought or sold from the reasoner for 50¢. The logical induction criterion says (very roughly) that there should not be any polynomial-time computable trading strategy with finite risk tolerance that earns unbounded profits in that market over time."Original text:https://intelligence.org/2016/09/12/new-paper-logical-induction/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 32min

Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability has formed in response to these concerns. As it matures, two major threads of research have begun to coalesce: feature visualization and attribution. This article focuses on feature visualization. While feature visualization is a powerful tool, actually getting it to work involves a number of details. In this article, we examine the major issues and explore common approaches to solving them. We find that remarkably simple methods can produce high-quality visualizations. Along the way we introduce a few tricks for exploring variation in what neurons react to, how they interact, and how to improve the optimization process.Original text:https://distill.pub/2017/feature-visualization/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 16min

Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.Source:https://arxiv.org/abs/2205.10625Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 17min

Understanding Intermediate Layers Using Linear Classifier Probes

Abstract:Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of a model and measure how suitable they are for classification. We use linear classifiers, which we refer to as "probes", trained entirely independently of the model itself. This helps us better understand the roles and dynamics of the intermediate layers. We demonstrate how this can be used to develop a better intuition about models and to diagnose potential problems. We apply this technique to the popular models Inception v3 and Resnet-50. Among other things, we observe experimentally that the linear separability of features increase monotonically along the depth of the model.Original text:https://arxiv.org/pdf/1610.01644.pdfNarrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner