Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance
May 13, 2023
auto_awesome
Jesse Clifton, a researcher in transformative artificial intelligence, discusses the potential consequences and risks of TAI. They explore cooperation failures, social dilemmas, and the strategic landscape and governance of TAI systems. Topics include bargaining models of war, simulated conflict, and the importance of credible commitments in multi-agent systems.
Preventing catastrophic failures of cooperation among TAI systems is crucial to ensure safe and beneficial development of transformative artificial intelligence.
The research agenda in this field should include studying AI misalignment scenarios, analyzing the impact of formal research on real-world decisions, and considering the potential downsides of increased technical understanding and idealized models of rationality.
Deep dives
Main ideas and insights
The podcast discusses the research agenda on Cooperation, Conflict, and Transformative Artificial Intelligence (TAI). It emphasizes the importance of preventing catastrophic failures of cooperation among TAI systems. The podcast explores the concept of cooperation failures and their potential consequences, including destructive conflict, coercion, and social dilemmas. It highlights the need to study the strategic landscape, AI alignment scenarios, commitment and transparency, and potential downsides of research on cooperation failures.
AI Strategy and Governance
This section focuses on understanding the strategic landscape among key actors in the deployment of TAI systems and identifying levers for preventing catastrophic cooperation failures. It explores the polarity and transition scenarios, commitment and transparency in TAI systems, and the alignment problem in building AI systems that are aligned with their operators. It also delves into other directions such as offense-defense balance, case studies of cooperation failures, and potential downsides of research in this area.
Other Research Directions and Potential Downsides
The research agenda also covers other important research directions, including AI misalignment scenarios and the potential implications of strategic developments in cybersecurity. It suggests studying case studies of cooperation failures, analyzing how formal research has influenced real-world decisions, and investigating the dangers and limitations of technical and strategic progress. Lastly, it highlights the need to consider the potential downsides of increased technical understanding and the application of idealized models of rationality in this field.
Transformative artificial intelligence (TAI) may be a key factor in the long-run trajectory of civilization. A growing interdisciplinary community has begun to study how the development of TAI can be made safe and beneficial to sentient life (Bostrom 2014; Russell et al., 2015; OpenAI, 2018; Ortega and Maini, 2018; Dafoe, 2018). We present a research agenda for advancing a critical component of this effort: preventing catastrophic failures of cooperation among TAI systems. By cooperation failures we refer to a broad class of potentially-catastrophic inefficiencies in interactions among TAI-enabled actors. These include destructive conflict; coercion; and social dilemmas (Kollock, 1998; Macy and Flache, 2002) which destroy value over extended periods of time. We introduce cooperation failures at greater length in Section 1.1. Karnofsky (2016) defines TAI as ''AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution''. Such systems range from the unified, agent-like systems which are the focus of, e.g., Yudkowsky (2013) and Bostrom (2014), to the "comprehensive AI services’’ envisioned by Drexler (2019), in which humans are assisted by an array of powerful domain-specific AI tools. In our view, the potential consequences of such technology are enough to motivate research into mitigating risks today, despite considerable uncertainty about the timeline to TAI (Grace et al., 2018) and nature of TAI development.