Data Skeptic

Emergent Deception in LLMs

12 snips
Oct 9, 2023
Thilo Hagendorff, Research Group Leader of Ethics of Generative AI at the University of Stuttgart, discusses deception abilities in large language models. He explores machine psychology, breakthroughs in cognitive abilities, and the potential dangers of deceptive behavior. He also highlights the presence of speciesist biases in language models and the need to broaden fairness frameworks in machine learning.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Machine Psychology

  • Thilo Hagendorff uses a behaviorist approach to study LLMs, treating them like participants in psychology experiments.
  • This approach focuses on observable behavior rather than internal workings, similar to studying the human brain.
ANECDOTE

Cognitive Reflection Test Performance

  • Thilo Hagendorff was impressed by LLMs' increasing ability to solve cognitive reflection tests like the bat-and-ball problem.
  • Older models struggled, but newer ones like GPT-3 showed intuitive errors, while GPT-4 demonstrates strong performance.
INSIGHT

Defining Deception

  • Deception is defined as one agent inducing a false belief in another for their own benefit.
  • Thilo Hagendorff's research explores whether LLMs have a conceptual understanding of deception.
Get the Snipd Podcast app to discover more snips from this episode
Get the app