Interdisciplinary Perspective on AI Proxy Failures

In this story, we discuss a recent paper on why proxy goals fail. First, we introduce proxy gaming, and then summarize the paper’s findings.

Proxy gaming is a well-documented failure mode in AI safety. For example, social media platforms use AI systems to recommend content to users. These systems are sometimes built to maximize the amount of time a user spends on the platform. The idea is that the time the user spends on the platform approximates the quality of the content being recommended. However, a user might spend even more time on a platform because they’re responding to an enraging post or interacting [...]

---

Outline:

(00:13) Interdisciplinary Perspective on AI Proxy Failures

(06:06) A Flurry of AI Fundraising and Model Releases

(12:53) Adversarial Inputs Make Chatbots Misbehave

(15:52) Links

---

First published:
July 5th, 2023

Source:
https://newsletter.safe.ai/p/ai-safety-newsletter-13

---

Want more? Check out our ML Safety Newsletter for technical safety research.

Narrated by TYPE III AUDIO.

AISN #13: An interdisciplinary perspective on AI proxy failures, new competitors to ChatGPT, and prompting language models to misbehave.