Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.
Jan Kulveit explains the concept of active inference, highlighting how it contrasts with traditional cognitive models in understanding human cognition.
The discussion explores the limitations of large language models in lacking feedback loops essential for learning and adaptation as per active inference theory.
Hierarchical agency is emphasized as critical for understanding interactions between humans and AI, which influences effective AI alignment strategies.
The podcast highlights the importance of cooperative AI in negotiating shared goals, aiming for beneficial interactions in real-world scenarios.
Deep dives
Active Inference and Large Language Models
The discussion centers on a paper that compares large language models (LLMs) to active inference, an approach originating from neuroscience. The authors propose that LLMs might be seen as special cases of active inference systems, suggesting they operate similarly in predicting sensory inputs. They explore how these models lack the feedback loop present in active inference, which is crucial for learning and adaptation. This observation raises questions about the implications of LLMs functioning without a tightly closed feedback loop and how that might shape their responses and interactions.
Understanding Active Inference
Active inference posits that the brain functions by constantly predicting sensory inputs and updating beliefs based on prediction errors. The theory suggests that human cognition is driven by a model that anticipates sensory information, correcting itself when discrepancies arise. This contrasts with classical cognitive models where information flows straightforwardly from sensory input to cognitive processing. The idea is that both perception and action can be understood through a unified framework of predictions and updates, leading to a dynamic interaction with the environment.
Challenges in Encoding Preferences
The conversation delves into how active inference addresses the challenge of encoding preferences and drives within its framework. It is suggested that evolutionary influences have shaped these preferences, often leading to fixed beliefs that drive behavior. The concept of 'fixed priors' is crucial, as they provide a foundation for actions based on learned expectations about the world. However, the framework struggles to account for how preferences evolve and how they influence actions, especially when conflicting desires are present.
Hierarchical Agency in Humans and AI
The notion of hierarchical agency is discussed in the context of both human and AI systems, capturing how entities are composed of sub-agents with varying levels of influence. It is emphasized that understanding these relationships is essential for tackling complex systems, particularly in AI alignment. The conversation points to the difficulties of modeling interactions within this hierarchy, noting that existing frameworks may not fully capture the dynamics at play. The exploration of hierarchical agency aims to create a better mathematical representation of these interactions and improve alignment strategies.
The Role of Cooperative AI
Cooperative AI forms a significant part of the research group's focus, emphasizing how AI systems can work together towards shared goals. The group aims to understand how these systems might negotiate and align their interests to achieve cooperation effectively. This includes empirical research on AI interactions, such as examining how negotiations between LLMs could unfold. Such research is critical given the anticipated integration of AI into real-world negotiating scenarios, and the results could provide insights into promoting beneficial AI behaviors.
Implications for AI Alignment
The conversation stresses the importance of understanding hierarchical agency for effective AI alignment strategies. It is suggested that the complexities arising from multiple agents, such as humans and AI working together, necessitate advanced models that can address these interactions. Without a robust framework for understanding how agency operates at different levels, risks of misalignment may significantly increase. The discussion underscores a pressing need for more nuanced theories and empirical research to guide future AI development.
Future Research Directions
The research group is exploring various directions, including formalizing models of hierarchical agency and studying bounded relational agents through active inference. This dual approach aims to combine theoretical and empirical research to develop a comprehensive understanding of complex system interactions. They express a need for more researchers to delve into how LLMs can negotiate preferences and act efficiently within digital ecosystems. The group's ambitious goals reflect an ongoing commitment to addressing the multifaceted challenges posed by advanced AI systems.
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group.