Prof. Murray Shanahan - Machines Don't Think Like Us

Machine Learning Street Talk (MLST)

00:00

Navigating AI Behavior: The Implications of RLHF and Alternative Approaches

This chapter explores Reinforcement Learning from Human Feedback (RLHF) and its effects on AI behavior, particularly focusing on the 'Waluigi Effect,' where performance can deteriorate unexpectedly. The discussion also highlights challenges in ensuring model alignment and considers alternative strategies such as constitutional AI for improving AI outcomes.

Play episode from 30:06

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence.

We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence.

Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing

Prof Murray Shanahan:

https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications)

https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en

https://en.wikipedia.org/wiki/Murray_Shanahan

https://x.com/mpshanahan

Interviewer: Dr. Tim Scarfe

Refs (links in the Google doc linked above):

Role play with large language models

Waluigi effect

"Conscious Exotica" - Paper by Murray Shanahan (2016)

"Simulators" - Article by Janis from LessWrong

"Embodiment and the Inner Life" - Book by Murray Shanahan (2010)

"The Technological Singularity" - Book by Murray Shanahan (2015)

"Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs)

A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper)

Work by Peter Godfrey-Smith on octopus consciousness

"Metaphors We Live By" - Book by George Lakoff (1980s)

Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned)

Wittgenstein's "Philosophical Investigations" (posthumously published)

Daniel Dennett's work on the "intentional stance"

Alan Turing's original paper on the Turing Test (1950)

Thomas Nagel's paper "What is it like to be a bat?" (1974)

John Searle's Chinese Room Argument (mentioned but not detailed)

Work by Richard Evans on tackling reasoning problems

Claude Shannon's quote on knowledge and control

"Are We Bodies or Souls?" - Book by Richard Swinburne

Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models

Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework"

Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge

Reference to Elizabeth Spelke's work on core knowledge in infants

Mention of Karl Friston's work on planning as inference (active inference)

The film "Ex Machina" - Murray Shanahan was the scientific advisor

"The Waluigi Effect"

Anthropic's constitutional AI approach

Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees

DeepMind's AlphaGo (mentioned multiple times as an example)

Mention of the "Golden Gate Claude" experiment

Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem

Mention of an interview with Irina Rish

Reference to an interview Tim Scarfe conducted with Daniel Dennett

Reference to an interview with Maria Santa Caterina

Mention of an interview with Philip Goff

Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World")

Peter Singer's work from 1975 on ascribing moral status to conscious beings

Demis Hassabis' discussion on the "ladder of creativity"

Reference to B.F. Skinner and behaviorism

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books