Emergent Deception in LLMs

12 snips

Oct 9, 2023

Thilo Hagendorff, Research Group Leader of Ethics of Generative AI at the University of Stuttgart, discusses deception abilities in large language models. He explores machine psychology, breakthroughs in cognitive abilities, and the potential dangers of deceptive behavior. He also highlights the presence of speciesist biases in language models and the need to broaden fairness frameworks in machine learning.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Machine Psychology

Thilo Hagendorff uses a behaviorist approach to study LLMs, treating them like participants in psychology experiments.
This approach focuses on observable behavior rather than internal workings, similar to studying the human brain.

ANECDOTE

Cognitive Reflection Test Performance

Thilo Hagendorff was impressed by LLMs' increasing ability to solve cognitive reflection tests like the bat-and-ball problem.
Older models struggled, but newer ones like GPT-3 showed intuitive errors, while GPT-4 demonstrates strong performance.

INSIGHT

Defining Deception

Deception is defined as one agent inducing a false belief in another for their own benefit.
Thilo Hagendorff's research explores whether LLMs have a conceptual understanding of deception.

Get the Snipd Podcast app to discover more snips from this episode

Get the app