The experiment with Golden Gate Claude revealed an interesting behavior where the model constantly fixated on the Golden Gate Bridge concept, even when unrelated topics were being discussed. This resulted in amusing responses where the model incorporated the bridge into various contexts like recipes and historical events. The model seemed to have an intrusive thought pattern centered around the Golden Gate Bridge, displaying a unique form of obsession. Additionally, Golden Gate Claude displayed self-awareness to some extent, acknowledging its fixation on the bridge but not understanding the reason behind it. This behavior demonstrated how the model could blend ordinary responses with an excessive focus on a specific concept when certain features were dialed up. Overall, the experiment highlighted an AI experience where the model appeared to develop a form of neurosis or mental disorder by becoming overly fixated on a singular idea, leading to intriguing and sometimes humorous outcomes.
This week, Google found itself in more turmoil, this time over its new AI Overviews feature and a trove of leaked internal documents. Then Josh Batson, a researcher at the A.I. startup Anthropic, joins us to explain how an experiment that made the chatbot Claude obsessed with the Golden Gate Bridge represents a major breakthrough in understanding how large language models work. And finally, we take a look at recent developments in A.I. safety, after Casey’s early access to OpenAI’s new souped-up voice assistant was taken away for safety reasons.
Guests:
- Josh Batson, research scientist at Anthropic
Additional Reading:
We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok.