

BlueDot Narrated
BlueDot Impact
Audio versions of the core readings, blog posts, and papers from BlueDot courses.
Episodes
Mentioned books

Jan 4, 2025 • 13min
Future ML Systems Will Be Qualitatively Different
Audio versions of blogs and papers from BlueDot courses.In 1972, the Nobel prize-winning physicist Philip Anderson wrote the essay "More Is Different". In it, he argues that quantitative changes can lead to qualitatively different and unexpected phenomena. While he focused on physics, one can find many examples of More is Different in other domains as well, including biology, economics, and computer science. Some examples of More is Different include: Uranium. With a bit of uranium, nothing special happens; with a large amount of uranium packed densely enough, you get a nuclear reaction. DNA. Given only small molecules such as calcium, you can’t meaningfully encode useful information; given larger molecules such as DNA, you can encode a genome. Water. Individual water molecules aren’t wet. Wetness only occurs due to the interaction forces between many water molecules interspersed throughout a fabric (or other material).Original text:https://bounded-regret.ghost.io/future-ml-systems-will-be-qualitatively-different/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 23min
Progress on Causal Influence Diagrams
Audio versions of blogs and papers from BlueDot courses.By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggAbout 2 years ago, we released the first few papers on understanding agent incentives using causal influence diagrams. This blog post will summarize progress made since then. What are causal influence diagrams? A key problem in AI alignment is understanding agent incentives. Concerns have been raised that agents may be incentivized to avoid correction, manipulate users, or inappropriately influence their learning. This is particularly worrying as training schemes often shape incentives in subtle and surprising ways. For these reasons, we’re developing a formal theory of incentives based on causal influence diagrams (CIDs).Source:https://deepmindsafetyresearch.medium.com/progress-on-causal-influence-diagrams-a7a32180b0d1Narrated for AI Safety Fundamentals by TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 3h 21min
Is Power-Seeking AI an Existential Risk?
Audio versions of blogs and papers from BlueDot courses.This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe. I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of ~5% that an existential catastrophe of this kind will occur by 2070. (May 2022 update: since making this report public in April 2021, my estimate here has gone up, and is now at >10%).Source:https://arxiv.org/abs/2206.13353Narrated for Joe Carlsmith Audio by TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 40min
AI Safety via Debate
Audio versions of blogs and papers from BlueDot courses.Abstract:To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier's accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties.Original text:https://arxiv.org/abs/1805.00899Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 6min
Summarizing Books With Human Feedback
Audio versions of blogs and papers from BlueDot courses.To safely deploy powerful, general-purpose artificial intelligence in the future, we need to ensure that machine learning models act in accordance with human intentions. This challenge has become known as the alignment problem.A scalable solution to the alignment problem needs to work on tasks where model outputs are difficult or time-consuming for humans to evaluate. To test scalable alignment techniques, we trained a model to summarize entire books, as shown in the following samples.Source:https://openai.com/research/summarizing-booksNarrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 18min
Superintelligence: Instrumental Convergence
Audio versions of blogs and papers from BlueDot courses.According to the orthogonality thesis, intelligent agents may have an enormous range of possible final goals. Nevertheless, according to what we may term the “instrumental convergence” thesis, there are some instrumental goals likely to be pursued by almost any intelligent agent, because there are some objectives that are useful intermediaries to the achievement of almost any final goal. We can formulate this thesis as follows:The instrumental convergence thesis:"Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents."Original article:https://drive.google.com/file/d/1KewDov1taegTzrqJ4uurmJ2CJ0Y72EU3/viewAuthor:Nick BostromA podcast by BlueDot Impact.

Jan 4, 2025 • 29min
Why AI Alignment Could Be Hard With Modern Deep Learning
Audio versions of blogs and papers from BlueDot courses.Why would we program AI that wants to harm us? Because we might not know how to do otherwise.Source:https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/Crossposted from the Cold Takes Audio podcast.---A podcast by BlueDot Impact.

Jan 4, 2025 • 16min
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Audio versions of blogs and papers from BlueDot courses.Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.Source:https://arxiv.org/abs/2205.10625Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.

Jan 4, 2025 • 7min
Learning From Human Preferences
Audio versions of blogs and papers from BlueDot courses.One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.Original article:https://openai.com/research/learning-from-human-preferencesAuthors:Dario Amodei, Paul Christiano, Alex RayA podcast by BlueDot Impact.

Jan 4, 2025 • 17min
Understanding Intermediate Layers Using Linear Classifier Probes
Audio versions of blogs and papers from BlueDot courses.Abstract:Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of a model and measure how suitable they are for classification. We use linear classifiers, which we refer to as "probes", trained entirely independently of the model itself. This helps us better understand the roles and dynamics of the intermediate layers. We demonstrate how this can be used to develop a better intuition about models and to diagnose potential problems. We apply this technique to the popular models Inception v3 and Resnet-50. Among other things, we observe experimentally that the linear separability of features increase monotonically along the depth of the model.Original text:https://arxiv.org/pdf/1610.01644.pdfNarrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.


