

Emergent Deception in LLMs
12 snips Oct 9, 2023
Thilo Hagendorff, Research Group Leader of Ethics of Generative AI at the University of Stuttgart, discusses deception abilities in large language models. He explores machine psychology, breakthroughs in cognitive abilities, and the potential dangers of deceptive behavior. He also highlights the presence of speciesist biases in language models and the need to broaden fairness frameworks in machine learning.
AI Snips
Chapters
Transcript
Episode notes
Machine Psychology
- Thilo Hagendorff uses a behaviorist approach to study LLMs, treating them like participants in psychology experiments.
- This approach focuses on observable behavior rather than internal workings, similar to studying the human brain.
Cognitive Reflection Test Performance
- Thilo Hagendorff was impressed by LLMs' increasing ability to solve cognitive reflection tests like the bat-and-ball problem.
- Older models struggled, but newer ones like GPT-3 showed intuitive errors, while GPT-4 demonstrates strong performance.
Defining Deception
- Deception is defined as one agent inducing a false belief in another for their own benefit.
- Thilo Hagendorff's research explores whether LLMs have a conceptual understanding of deception.