"AI as a science, and three obstacles to alignment strategies" by Nate Soares
Oct 30, 2023
auto_awesome
Nate Soares discusses the shift in focus from understanding minds to building empirical understanding of modern AIs. The podcast explores the obstacles to aligning smarter than human AI and the importance of interpretability research. It also highlights the challenges of differentiating genuine solutions from superficial ones and the need for a comprehensive scientific understanding of AI.
AI engineering focuses on building large neural networks and training them with vast amounts of data, rather than comprehending how minds work.
Without a science of artificial minds, aligning superintelligent AI with human values becomes challenging.
Deep dives
AI Engineering: From Understanding Minds to Constructing Neural Networks
In the early days of AI, researchers aimed to develop a theory of cognition but were unsuccessful. Today, AI engineering focuses on building large neural networks and training them with vast amounts of data, rather than comprehending how minds work. This shift is driven by the realization that the fastest path to more powerful AI systems does not necessarily involve understanding their internal workings. However, without a science of artificial minds, aligning superintelligent AI with human values becomes challenging.
Obstacles to Alignment: Interconnectedness of Alignment and Capabilities
The pursuit of alignment in AI systems is intertwined with developing their capabilities. Research on interpretability aims to understand how AI systems function and what prevents them from being safely scaled up to superintelligence. However, gaining visibility into AI internals not only helps align them but also enhances their capabilities. This indicates that alignment may always be in catch-up mode, as gaining a deeper understanding of AI leads to developing more capable systems before full alignment is achieved.
Challenge of Distinguishing Solutions: Bureaucratic Legibility
Distinguishing real solutions in AI alignment from false ones presents a significant challenge. The current usage of the term 'alignment' has been diluted to superficial outcomes, making it difficult to evaluate true progress. Moreover, bureaucratic processes struggle to identify and regulate promising theories for aligning AI systems. Without a mature science of AI, regulating capabilities advancements becomes problematic, as regulators may lack the necessary understanding to discern between reliable and inadequate solutions.
AI used to be a science. In the old days (back when AI didn't work very well), people were attempting to develop a working theory of cognition.
Those scientists didn’t succeed, and those days are behind us. For most people working in AI today and dividing up their work hours between tasks, gone is the ambition to understand minds. People working on mechanistic interpretability (and others attempting to build an empirical understanding of modern AIs) are laying an important foundation stone that could play a role in a future science of artificial minds, but on the whole, modern AI engineering is simply about constructing enormous networks of neurons and training them on enormous amounts of data, not about comprehending minds.
The bitter lesson has been taken to heart, by those at the forefront of the field; and although this lesson doesn't teach us that there's nothing to learn about how AI minds solve problems internally, it suggests that the fastest path to producing more powerful systems is likely to continue to be one that doesn’t shed much light on how those systems work.
Absent some sort of “science of artificial minds”, however, humanity’s prospects for aligning smarter-than-human AI seem to me to be quite dim.