

Will AI Eat Itself?
9 snips Jan 22, 2025
Julia Kemper, a data scientist at NYU who specializes in AI model outputs, and Shayne Longpre, a PhD candidate at MIT leading the Data Provenance Initiative, discuss the alarming concept of 'model collapse.' They explore how AI's reliance on AI-generated data risks homogenous and bland outputs. Kemper highlights the challenges in improving AI performance under such conditions, while Longpre emphasizes the crucial role of human curation in enhancing AI training data quality. Together, they envision a future where human creativity revitalizes AI’s capabilities.
AI Snips
Chapters
Transcript
Episode notes
Model Collapse Definition
- Model collapse is a degenerative process where AI-generated data pollutes the training set of the next generation.
- This leads to misperceptions of reality as AI models are trained on this polluted data.
Turkey Thanksgiving Example
- Researchers asked an LLM to cook a turkey for Thanksgiving multiple times.
- After four generations, the LLM responded with existential questions instead of cooking instructions.
Regression to the Mean
- AI models trained on increasingly average data lose their ability to generate diverse outputs.
- This leads to a blander output, lacking the quirks and outliers present in human-generated data.