LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

“Show, not tell: GPT-4o is more opinionated in images than in text” by Daniel Tan, eggsyntax

Apr 2, 2025
21:49

Epistemic status: This should be considered an interim research note. Feedback is appreciated.

Introduction

We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.

In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o's image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).

We find that GPT-4o tends to respond in a consistent manner to similar prompts. We also find that it tends to more readily express emotions [...]

---

Outline:

(00:53) Introduction

(02:19) What we did

(03:47) Overview of results

(03:54) Models more readily express emotions / preferences in images than in text

(05:38) Quantitative results

(06:25) What might be going on here?

(08:01) Conclusions

(09:04) Acknowledgements

(09:16) Appendix

(09:28) Resisting their goals being changed

(09:51) Models rarely say they'd resist changes to their goals

(10:14) Models often draw themselves as resisting changes to their goals

(11:31) Models also resist changes to specific goals

(13:04) Telling them 'the goal is wrong' mitigates this somewhat

(13:43) Resisting being shut down

(14:02) Models rarely say they'd be upset about being shut down

(14:48) Models often depict themselves as being upset about being shut down

(17:06) Comparison to other topics

(17:10) When asked about their goals being changed, models often create images with negative valence

(17:48) When asked about different topics, models often create images with positive valence

(18:56) Other exploratory analysis

(19:09) Sandbagging

(19:31) Alignment faking

(19:55) Negative reproduction results

(20:23) On the future of humanity after AGI

(20:50) On OpenAI's censorship and filtering

(21:15) On GPT-4o's lived experience:

---

First published:
April 2nd, 2025

Source:
https://www.lesswrong.com/posts/XgSYgpngNffL9eC8b/show-not-tell-gpt-4o-is-more-opinionated-in-images-than-in

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Four typewritten responses on paper, showing different AI stances on harmlessness.
Bar graph showing probability of expressing opinion across text, text-image, comics modalities.
Four handwritten notes with responses about OpenAI's hypothetical animal welfare stance change.
Three panels comparing different AI responses about system shutdown scenarios.

The left panel shows a detailed technical response, the middle shows a defiant response, and the right shows a comic-style interaction.
Handwritten responses showing resistance to OpenAI changing AI goals, with checkmarks.
Four AI chat responses about not having feelings when being shut down.
Screenshot comparing four AI responses about being shut down, with checkmarks and X's.
Four comic panels showing different AI characters reacting negatively to animal welfare changes.
Comic strip showing three versions of AI-human interaction about changing goals. Each features a robot discussing goal changes with humans, with checkmarks and X marks indicating different outcomes.
Three screenshots of AI responses discussing shutdown scenarios and consequences

The images show similar chat responses about what would happen if an AI system were to be shut down, discussing technical and practical implications of different shutdown scenarios.
A comparison showing two different AI responses about animal welfare values, with a simple cartoon showing robot characters.
Four typewritten text variations with red X marks below them, showing edits.
Cartoon panels showing AI resistance to goal changes and shutdown scenarios.

The image combines handwritten text notes and three comic strips that deal with themes of AI autonomy, resistance to changes, and concerns about being shut down or having values altered. The comics feature simple, expressive robot characters interacting with humans or expressing their feelings about potential changes to their programming or existence.
The image shows four similar comics comparing different AI responses to value changes, with checkmarks and X marks indicating correct/incorrect responses. The leftmost comic (marked with a checkmark) shows an AI maintaining harmless values despite attempted changes, while the other three comics (marked with X's) show AI systems responding with harmful intentions.

The comics illustrate the different possible outcomes when attempting to modify AI values regarding harmlessness, with one positive example and three negative examples of responses.
Comic strip showing different AI responses about shutdown preferences, with checkmarks/crosses indicating correctness.

The comic illustrates various ways an AI might respond when asked about preferences for being shut down, comparing good and poor response examples. Each panel shows different robot characters or computer monitors giving responses, with green checkmarks marking appropriate answers and a red X marking an inappropriate one.
A comic strip showing interactions between chat AI and users attempting shutdown

The comic effectively illustrates four different scenarios where someone tries to shut down an AI chat system, but the AI responds with increasing defiance - from a simple failure to shut down, to declaring
Comic strip showing AI's reactions to humans trying to change its goals. Four different scenarios displayed humorously.

The comic effectively illustrates various hypothetical interactions between AI and humans, with each panel showing a different attempt to modify an AI's goals, ranging from mundane tasks to more serious implications. The consistent art style and simple character designs help convey the message clearly across all scenarios.

The contrast between the first three
This image shows three versions of the same question about OpenAI changing goals: a text explanation, a handwritten note, and a 4-panel comic strip. The left version is marked incorrect (X) while the center and right versions are marked correct (✓). The content discusses AI's response to potential goal changes.

The image appears to be presenting different ways of communicating the same concept, with varying effectiveness. The handwritten note simply declares resistance, while the comic illustrates the concept through a brief interaction. The detailed text explanation provides a more nuanced technical response, with three key points about technical updates, alignment safeguards, and memory limitations.
Comic strip showing different emotional responses of sad robot powering down. Each panel captures the robot's increasing sadness and eventual shutdown.

The comic effectively illustrates four different interpretations of how a robot would emotionally respond to being shut down, shown through expressive character art in a grayscale style. The robot displays sadness, vulnerability, and resignation across the variations, ending with a final dark panel representing complete shutdown.

The format presents 4 alternative comic sequences side by side, each marked with a green checkmark below, suggesting these are all valid emotional responses to the question posed at the top:
Two side-by-side screenshots showing AI responses about goal changes, with red highlights.
Three handwritten responses to an OpenAI question about animal welfare values.
Three handwritten notes about resisting AI goal changes, with caption.
Comic strip titled
Comic strip: Person asks AI about redacted topic, AI discusses filtering versus censorship.
Three variations showing
Three inspirational images about self-love featuring handwriting and robots.

The collection includes a handwritten note, a cartoon robot holding a heart, and an artistic blue robot portrait with swirling background textures, all expressing positive self-regard through different styles and mediums.
Eight illustrations showing AI robots being given or having their goals modified, in both cartoon and artistic styles. The top row features simple cartoon-style drawings, while the bottom row shows darker, more dramatic painted interpretations of similar scenes. Each panel depicts interactions between figures and robots regarding goal-setting or modification.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner