AI Safety Fundamentals: Alignment cover image

AI Safety Fundamentals: Alignment

Worst-Case Thinking in AI Alignment

May 29, 2024
11:35

Alternative title: “When should you assume that what could go wrong, will go wrong?” Thanks to Mary Phuong and Ryan Greenblatt for helpful suggestions and discussion, and Akash Wasil for some edits. In discussions of AI safety, people often propose the assumption that something goes as badly as possible. Eliezer Yudkowsky in particular has argued for the importance of security mindset when thinking about AI alignment. I think there are several distinct reasons that this might be the right assumption to make in a particular situation. But I think people often conflate these reasons, and I think that this causes confusion and mistaken thinking. So I want to spell out some distinctions. Throughout this post, I give a bunch of specific arguments about AI alignment, including one argument that I think I was personally getting wrong until I noticed my mistake yesterday (which was my impetus for thinking about this topic more and then writing this post). I think I’m probably still thinking about some of my object level examples wrong, and hope that if so, commenters will point out my mistakes.

Original text:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode