Interdisciplinary Perspective on AI Proxy Failures
In this story, we discuss a recent paper on why proxy goals fail. First, we introduce proxy gaming, and then summarize the paper’s findings.
Proxy gaming is a well-documented failure mode in AI safety. For example, social media platforms use AI systems to recommend content to users. These systems are sometimes built to maximize the amount of time a user spends on the platform. The idea is that the time the user spends on the platform approximates the quality of the content being recommended. However, a user might spend even more time on a platform because they’re responding to an enraging post or interacting [...]
---
Outline:
(00:13) Interdisciplinary Perspective on AI Proxy Failures
(06:06) A Flurry of AI Fundraising and Model Releases
(12:53) Adversarial Inputs Make Chatbots Misbehave
(15:52) Links
---
First published:
July 5th, 2023
Source:
https://newsletter.safe.ai/p/ai-safety-newsletter-13
---
Want more? Check out our ML Safety Newsletter for technical safety research.
Narrated by TYPE III AUDIO.