AI Safety Fundamentals cover image

AI Safety Fundamentals

Latest episodes

undefined
Jan 4, 2025 • 18min

Emerging Processes for Frontier AI Safety

The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such technologies can pose significant risks. The AI Safety Summit will focus on how best to manage the risks from frontier AI such as misuse, loss of control and societal harms. Frontier AI organisations play an important role in addressing these risks and promoting the safety of the development and deployment of frontier AI.The UK has therefore encouraged frontier AI organisations to publish details on their frontier AI safety policies ahead of the AI Safety Summit hosted by the UK on 1 to 2 November 2023. This will provide transparency regarding how they are putting into practice voluntary AI safety commitments and enable the sharing of safety practices within the AI ecosystem. Transparency of AI systems can increase public trust, which can be a significant driver of AI adoption.This document complements these publications by providing a potential list of frontier AI organisations’ safety policies.Source:https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safetyNarrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 23min

Challenges in Evaluating AI Systems

Most conversations around the societal impacts of artificial intelligence (AI) come down to discussing some quality of an AI system, such as its truthfulness, fairness, potential for misuse, and so on. We are able to talk about these characteristics because we can technically evaluate models for their performance in these areas. But what many people working inside and outside of AI don’t fully appreciate is how difficult it is to build robust and reliable model evaluations. Many of today’s existing evaluation suites are limited in their ability to serve as accurate indicators of model capabilities or safety.At Anthropic, we spend a lot of time building evaluations to better understand our AI systems. We also use evaluations to improve our safety as an organization, as illustrated by our Responsible Scaling Policy. In doing so, we have grown to appreciate some of the ways in which developing and running evaluations can be challenging.Here, we outline challenges that we have encountered while evaluating our own models to give readers a sense of what developing, implementing, and interpreting model evaluations looks like in practice.Source:https://www.anthropic.com/news/evaluating-ai-systemsNarrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 21min

AI Control: Improving Safety Despite Intentional Subversion

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:We summarize the paper;We compare our methodology to the methodology of other safety papers.Source:https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversionNarrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 27min

Computing Power and the Governance of AI

This post summarises a new report, “Computing Power and the Governance of Artificial Intelligence.” The full report is a collaboration between nineteen researchers from academia, civil society, and industry. It can be read here.GovAI research blog posts represent the views of their authors, rather than the views of the organisation.Source:https://www.governance.ai/post/computing-power-and-the-governance-of-aiNarrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 10min

Public by Default: How We Manage Information Visibility at Get on Board

I’ve been obsessed with managing information, and communications in a remote team since Get on Board started growing. Reducing the bus factor is a primary motivation — but another just as important is diminishing reliance on synchronicity. When what I know is documented and accessible to others, I’m less likely to be a bottleneck for anyone else in the team. So if I’m busy, minding family matters, on vacation, or sick, I won’t be blocking anyone.This, in turn, gives everyone in the team the freedom to build their own work schedules according to their needs, work from any time zone, or enjoy more distraction-free moments. As I write these lines, most of the world is under quarantine, relying on non-stop video calls to continue working. Needless to say, that is not a sustainable long-term work schedule.Original text:https://www.getonbrd.com/blog/public-by-default-how-we-manage-information-visibility-at-get-on-boardAuthor:Sergio NouvelA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 11min

Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points

We took 10 years of research and what we’ve learned from advising 1,000+ people on how to build high-impact careers, compressed that into an eight-week course to create your career plan, and then compressed that into this three-page summary of the main points.(It’s especially aimed at people who want a career that’s both satisfying and has a significant positive impact, but much of the advice applies to all career decisions.)Original article:https://80000hours.org/career-planning/summary/Author:Benjamin ToddA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 5min

Become a Person who Actually Does Things

The next four weeks of the course are an opportunity for you to actually build a thing that moves you closer to contributing to AI Alignment, and we're really excited to see what you do!A common failure mode is to think "Oh, I can't actually do X" or to say "Someone else is probably doing Y." You probably can do X, and it's unlikely anyone is doing Y! It could be you!Original text:https://www.neelnanda.io/blog/become-a-person-who-actually-does-thingsAuthor:Neel NandaA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 15min

How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach

I am approaching the end of my AI governance PhD, and I’ve spent about 2.5 years as a researcher at FHI. During that time, I’ve learnt a lot about the formula for successful early-career research.This post summarises my advice for people in the first couple of years. Research is really hard, and I want people to avoid the mistakes I’ve made.Original text:https://forum.effectivealtruism.org/posts/jfHPBbYFzCrbdEXXd/how-to-succeed-as-an-early-stage-researcher-the-lean-startup#ConclusionAuthor:Toby ShevlaneA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 1h 9min

Working in AI Alignment

This guide is written for people who are considering direct work on technical AI alignment. I expect it to be most useful for people who are not yet working on alignment, and for people who are already familiar with the arguments for working on AI alignment. If you aren’t familiar with the arguments for the importance of AI alignment, you can get an overview of them by doing the AI Alignment Course.by Charlie Rogers-Smith, with minor updates by Adam JonesSource:https://aisafetyfundamentals.com/blog/alignment-careers-guideNarrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
undefined
Jan 4, 2025 • 7min

Being the (Pareto) Best in the World

This introduces the concept of Pareto frontiers. The top comment by Rob Miles also ties it to comparative advantage.While reading, consider what Pareto frontiers your project could place you on.Original text:https://www.lesswrong.com/posts/XvN2QQpKTuEzgkZHY/being-the-pareto-best-in-the-worldAuthor:John WentworthA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner