Tragedy of the (data) commons

Oct 25, 2024

Shayne Longprey, an MIT PhD student involved in the Data Provenance Initiative, and Robert Mahari, a researcher at MIT Media Lab and Harvard Law School, delve into key issues surrounding AI data ethics. They discuss the importance of transparency in AI training data and how the decline of publicly available datasets threatens innovation. Their insights from the study "Consent in Crisis" reveal the complexities of data provenance and attribution in generative AI, stressing the need for better consent protocols to safeguard community resources.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 2min

Navigating Data Provenance in AI Training

02:27 • 9min

Exploring User Patterns in Generative AI

11:31 • 3min

Attribution Challenges and the Role of Retrieval in Generative AI

14:58 • 5min

The Data Commons Debate

20:23 • 10min