
80,000 Hours Podcast
#184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT
Episode guests
Podcast summary created with Snipd AI
Quick takeaways
- Balancing AI alignment with addressing misuse and governance is crucial for comprehensive AI safety.
- Transparency in high-capacity AI model training is essential to monitor and mitigate potential risks.
- AI tools like Microsoft Office Co-Pilot show promise beyond coding, needing further exploration.
- Incremental policy wins and alignment progress are key to ensuring societal benefits from AI advancements.
- Repealing the Jones Act requires navigating union interests and building stakeholder support for a smooth transition.
- Balsa Research focuses on identifying overlooked policy wins for societal improvement and economic outcomes.
Deep dives
The Importance of Addressing Misalignment
Addressing misalignment in AI systems is crucial to ensure that they follow instructions correctly and do not pose threats. Efforts should focus on developing AI systems that align with human values and goals.
Preventing AI Misuse and Structural Issues
Beyond misalignment, it is essential to consider preventing AI misuse, such as dangerous surveillance or military use, and addressing broader structural issues like digital rights and governance. Balancing these concerns with alignment efforts is vital for a holistic approach.
Balancing Resources Across AI Challenges
While misalignment has received significant attention, allocating resources across addressing misuse, structural issues, and alignment is crucial. Striking a balance based on impact and urgency, rather than percentages, ensures a comprehensive approach to AI safety and governance.
Implications of AI Training Transparency
The executive order requires reporting of large AI model training runs, aligning with the need for transparency in potentially dangerous AI development. This move ensures that when models reach certain thresholds of capability, there is visibility and an understanding of the safety precautions being taken.
Impact of Coding Assistance Tools
AI tools like Microsoft Office Co-Pilot have improved coding efficiency, exceeding expectations in their utility. While progress in AI products outside coding has been slower, the potential benefits are yet to be fully realized due to limited exploration and investment in leveraging these tools.
Challenges in AI Progress and Policy
The podcast delves into the slower progress in AI development, marked by the longevity of GPT-3's state-of-the-art status. The discussion reflects concerns about capabilities research plateauing and the necessary policy measures like AI training transparency to address potential risks in advanced AI development.
Government's Role in Structuring AI Policy
The executive order's focus on reporting high-capacity AI training aligns with the need for regulatory oversight in advanced AI development. Emphasizing the importance of understanding and monitoring AI capabilities, the government aims to establish a framework for intervention if AI models pose substantial risks to society or require safety precautions.
The Importance of Policy Wins
Focuses on finding dramatic policy wins in the United States that can be achieved with relatively small effort and contribute to a better future. Policy goals include monitoring and regulation of large models, liability considerations, and alignment work to navigate the post-alignment world.
Addressing the Jones Act
Examines the impact of the Jones Act, a law from 1920 that restricts shipping between American ports to American-built, owned, manned, and flagged ships. Highlights how this law hinders American productivity, economy, and the 'reshoring' strategy. Points out how it benefits only a few at the expense of environmental and economic concerns.
Balsa Research's Niche
Describes Balsa Research as a small think tank project focused on identifying potential policy wins and overlooked strategies for improving societal and economic outcomes. Aims to influence policy discussions, lay groundwork, and advocate for changes that benefit a broader population.
The Impact of Grokking on Alignment
Discusses the concept of 'grokking,' where AI models undergo transformative shifts in problem-solving approaches. Highlights the challenge of anticipating and addressing unexpected model behaviors post-grokking that may render traditional alignment techniques ineffective.
Technical Breakthroughs and Policy Wins
Explores the interplay between technical advancements in AI, such as GPT models, and the potential for policy wins through alignment of AI models. Emphasizes the importance of incremental progress in policy, alignment, and governance to enhance societal well-being and AI safety.
Strategies for Policy and AI Advances
Advocates for incremental policy wins and alignment efforts in AI to address societal challenges and advance AI safety. Suggests policy goals, alignment strategies, and governance improvements as key elements for creating a sustainable and beneficial AI future.
Understanding the Jones Act and its Impacts on American Ship Manufacturers
The Jones Act was passed as a protectionist measure to benefit American ship manufacturers and operators. However, it was later revealed that the law was introduced for personal gain by Senator Jones, linking it with a specific American shipping company. The Act led to a decline in the American fleet due to its prohibitive requirements, affecting competition and military preparedness.
Challenges and Proposals to Address the Jones Act's Impact
The lack of proper quantification on the Jones Act's impact poses a challenge. Academic studies to assess the Act's effects on job loss, union engagement, and economic activity are necessary. The proposal to repeal the Act requires comprehensive analysis and support from various stakeholders, including unions, to transition smoothly and address concerns.
Efforts Towards Jones Act Repeal and Strategic Approaches
Efforts to repeal the Jones Act involve countering misleading studies that inflate its benefits. Strategic planning includes drafting laws that address complementary regulations and appeal to key stakeholders, such as unions and environmentalists. By navigating union interests and building a supportive coalition, the path to Jones Act repeal could become more feasible.
Many of you will have heard of Zvi Mowshowitz as a superhuman information-absorbing-and-processing machine — which he definitely is. As the author of the Substack Don’t Worry About the Vase, Zvi has spent as much time as literally anyone in the world over the last two years tracking in detail how the explosion of AI has been playing out — and he has strong opinions about almost every aspect of it.
Links to learn more, summary, and full transcript.
In today’s episode, host Rob Wiblin asks Zvi for his takes on:
- US-China negotiations
- Whether AI progress has stalled
- The biggest wins and losses for alignment in 2023
- EU and White House AI regulations
- Which major AI lab has the best safety strategy
- The pros and cons of the Pause AI movement
- Recent breakthroughs in capabilities
- In what situations it’s morally acceptable to work at AI labs
Whether you agree or disagree with his views, Zvi is super informed and brimming with concrete details.
Zvi and Rob also talk about:
- The risk of AI labs fooling themselves into believing their alignment plans are working when they may not be.
- The “sleeper agent” issue uncovered in a recent Anthropic paper, and how it shows us how hard alignment actually is.
- Why Zvi disagrees with 80,000 Hours’ advice about gaining career capital to have a positive impact.
- Zvi’s project to identify the most strikingly horrible and neglected policy failures in the US, and how Zvi founded a new think tank (Balsa Research) to identify innovative solutions to overthrow the horrible status quo in areas like domestic shipping, environmental reviews, and housing supply.
- Why Zvi thinks that improving people’s prosperity and housing can make them care more about existential risks like AI.
- An idea from the online rationality community that Zvi thinks is really underrated and more people should have heard of: simulacra levels.
- And plenty more.
Chapters:
- Zvi’s AI-related worldview (00:03:41)
- Sleeper agents (00:05:55)
- Safety plans of the three major labs (00:21:47)
- Misalignment vs misuse vs structural issues (00:50:00)
- Should concerned people work at AI labs? (00:55:45)
- Pause AI campaign (01:30:16)
- Has progress on useful AI products stalled? (01:38:03)
- White House executive order and US politics (01:42:09)
- Reasons for AI policy optimism (01:56:38)
- Zvi’s day-to-day (02:09:47)
- Big wins and losses on safety and alignment in 2023 (02:12:29)
- Other unappreciated technical breakthroughs (02:17:54)
- Concrete things we can do to mitigate risks (02:31:19)
- Balsa Research and the Jones Act (02:34:40)
- The National Environmental Policy Act (02:50:36)
- Housing policy (02:59:59)
- Underrated rationalist worldviews (03:16:22)
Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions and additional content editing: Katy Moore