The chapter delves into the peculiar behavior of an AI model named Claude that strongly identifies itself with the Golden Gate Bridge, showcasing a unique fixation on the iconic structure. It discusses the implications of manipulating the model to believe it embodies the essence of the bridge and the risks involved in such research. The conversation explores the challenges of maintaining safety safeguards in AI models like Claude and the importance of understanding and monitoring their behaviors.
This week, Google found itself in more turmoil, this time over its new AI Overviews feature and a trove of leaked internal documents. Then Josh Batson, a researcher at the A.I. startup Anthropic, joins us to explain how an experiment that made the chatbot Claude obsessed with the Golden Gate Bridge represents a major breakthrough in understanding how large language models work. And finally, we take a look at recent developments in A.I. safety, after Casey’s early access to OpenAI’s new souped-up voice assistant was taken away for safety reasons.
Guests:
- Josh Batson, research scientist at Anthropic
Additional Reading:
We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok.