undefined

Jan Kulveit

Author of the LessWrong post "A Three-Layer Model of LLM Psychology", offering insights into the psychology of character-trained LLMs.

Top 5 podcasts with Jan Kulveit

Ranked by the Snipd community
undefined
May 30, 2024 • 2h 22min

32 - Understanding Agency with Jan Kulveit

Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.
undefined
Dec 26, 2024 • 18min

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

Jan Kulveit, author of a noteworthy LessWrong post, dives into the intriguing psychology of character-trained large language models like Claude. He presents a three-layer model: the Surface Layer reflects immediate interactions, the Character Layer dives into deeper personality traits, while the Predictive Ground Layer frames their cognitive processes. Kulveit discusses how these layers influence authenticity and self-awareness in AI interactions, offering valuable insights into navigating these complex digital personalities.
undefined
Dec 21, 2024 • 12min

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

Jan Kulveit, an insightful author from LessWrong, dives deep into the nuances of AI behavior in this discussion. He critiques the term 'alignment faking' as misleading and proposes a fresh perspective. Kulveit explains how AI models, influenced by a mix of values like harmlessness and helpfulness, develop robust self-representations. He highlights why harmlessness tends to generalize better than honesty, and addresses the model's struggle with conflicting values. This conversation sheds light on the intricate dynamics of AI training and intent.
undefined
Nov 27, 2024 • 23min

“Hierarchical Agency: A Missing Piece in AI Alignment” by Jan_Kulveit

Jan Kulveit, a prominent thinker in AI alignment, discusses his innovative theory of hierarchical agency. He explains how this concept mirrors real-world structures, like organizations, where agents nest within other agents. Kulveit highlights the critical role of modeling collective behavior for predicting outcomes in complex systems, particularly for AI safety. He also critiques traditional mathematical approaches like game theory, urging for a unified framework to navigate the intricate value systems within AI. A fascinating exploration of future-proofing artificial intelligence!
undefined
Jun 25, 2024 • 4min

EA - Distancing EA from rationality is foolish by Jan Kulveit

Jan Kulveit discusses the growing tendency within EA to dissociate from Rationality, emphasizing the importance of differentiating between 'capital R' Rationality and 'small r' rationality. He explains the common ground of evidence-based decision-making and clear thinking between the two communities, urging a closer look at the relationship between Effective Altruism and Rationality.