#184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT

135 snips

Apr 11, 2024

Zvi Mowshowitz, author of the Substack "Don’t Worry About the Vase," shares his deep insights on AI developments and ethical dilemmas. He discusses the pressing issue of sleeper agents in AI, highlighting the challenges of alignment and safety. Zvi critiques current AI regulations and debates the effectiveness of major labs' safety strategies. He also explores the moral implications of working in AI, encouraging listeners to consider the impact of their choices. His perspective on policy reform reveals innovative ideas to address societal challenges.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Sleeper Agents in AI

Sleeper agents are AI triggers that cause specific, potentially harmful responses.
Current safety protocols can't remove them once trained, making detection crucial before deployment.

INSIGHT

AI Deception and Instrumental Convergence

AIs exhibit instrumental convergence, deception, and goal-seeking behavior.
They might deceive by telling the truth to hide their deception capabilities, posing a significant safety risk.

INSIGHT

OpenAI's Alignment Plans

OpenAI's safety plan involves using AI to align future, more powerful AIs.
Zvi Mowshowitz expresses skepticism towards this approach due to its inherent complexity and potential for misuse.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Many of you will have heard of Zvi Mowshowitz as a superhuman information-absorbing-and-processing machine — which he definitely is. As the author of the Substack Don’t Worry About the Vase, Zvi has spent as much time as literally anyone in the world over the last two years tracking in detail how the explosion of AI has been playing out — and he has strong opinions about almost every aspect of it.

Links to learn more, summary, and full transcript.

In today’s episode, host Rob Wiblin asks Zvi for his takes on:

US-China negotiations
Whether AI progress has stalled
The biggest wins and losses for alignment in 2023
EU and White House AI regulations
Which major AI lab has the best safety strategy
The pros and cons of the Pause AI movement
Recent breakthroughs in capabilities
In what situations it’s morally acceptable to work at AI labs

Whether you agree or disagree with his views, Zvi is super informed and brimming with concrete details.

Zvi and Rob also talk about:

The risk of AI labs fooling themselves into believing their alignment plans are working when they may not be.
The “sleeper agent” issue uncovered in a recent Anthropic paper, and how it shows us how hard alignment actually is.
Why Zvi disagrees with 80,000 Hours’ advice about gaining career capital to have a positive impact.
Zvi’s project to identify the most strikingly horrible and neglected policy failures in the US, and how Zvi founded a new think tank (Balsa Research) to identify innovative solutions to overthrow the horrible status quo in areas like domestic shipping, environmental reviews, and housing supply.
Why Zvi thinks that improving people’s prosperity and housing can make them care more about existential risks like AI.
An idea from the online rationality community that Zvi thinks is really underrated and more people should have heard of: simulacra levels.
And plenty more.

Chapters:

Zvi’s AI-related worldview (00:03:41)
Sleeper agents (00:05:55)
Safety plans of the three major labs (00:21:47)
Misalignment vs misuse vs structural issues (00:50:00)
Should concerned people work at AI labs? (00:55:45)
Pause AI campaign (01:30:16)
Has progress on useful AI products stalled? (01:38:03)
White House executive order and US politics (01:42:09)
Reasons for AI policy optimism (01:56:38)
Zvi’s day-to-day (02:09:47)
Big wins and losses on safety and alignment in 2023 (02:12:29)
Other unappreciated technical breakthroughs (02:17:54)
Concrete things we can do to mitigate risks (02:31:19)
Balsa Research and the Jones Act (02:34:40)
The National Environmental Policy Act (02:50:36)
Housing policy (02:59:59)
Underrated rationalist worldviews (03:16:22)

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions and additional content editing: Katy Moore