Will AI Eat Itself?

9 snips

Jan 22, 2025

Julia Kemper, a data scientist at NYU who specializes in AI model outputs, and Shayne Longpre, a PhD candidate at MIT leading the Data Provenance Initiative, discuss the alarming concept of 'model collapse.' They explore how AI's reliance on AI-generated data risks homogenous and bland outputs. Kemper highlights the challenges in improving AI performance under such conditions, while Longpre emphasizes the crucial role of human curation in enhancing AI training data quality. Together, they envision a future where human creativity revitalizes AI’s capabilities.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Model Collapse Definition

Model collapse is a degenerative process where AI-generated data pollutes the training set of the next generation.
This leads to misperceptions of reality as AI models are trained on this polluted data.

ANECDOTE

Turkey Thanksgiving Example

Researchers asked an LLM to cook a turkey for Thanksgiving multiple times.
After four generations, the LLM responded with existential questions instead of cooking instructions.

INSIGHT

Regression to the Mean

AI models trained on increasingly average data lose their ability to generate diverse outputs.
This leads to a blander output, lacking the quirks and outliers present in human-generated data.

Get the Snipd Podcast app to discover more snips from this episode

Get the app