On Point | Podcast

What happens when you train your AI on AI-generated data?

12 snips
May 19, 2025
Ari Morkos, Co-founder and CEO of Datology AI, and Kalyan Viramacaneni, Co-founder and CEO of Data SIBO, dive deep into the fascinating world of synthetic data in AI training. They discuss how AI systems might generate their own training data to address the shortage of high-quality real-world data. The duo explores the pros and cons of synthetic data, its critical role in applications like fraud detection, and the challenges of ensuring data integrity. They highlight the need for balance between synthetic and real data to maintain AI reliability.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

AI Training Data Limits Loom

  • AI models train initially on vast real-world data scanning billions of text tokens to learn language patterns.
  • Current research shows that running out of this real-world data is approaching, prompting synthetic data use.
INSIGHT

Data Use Efficiency Over Volume

  • Despite new data creation, public real-world data for AI models is largely exhausted.
  • Improving data usage efficiency offers far more gains than merely collecting more data.
ANECDOTE

Edge Cases in Self-Driving Cars

  • Self-driving cars mostly encounter common highway conditions in training but face rare, complex edge cases like pedestrians or obstacles.
  • Modeling these edge cases with synthetic data is critical to prevent accidents and improve safety.
Get the Snipd Podcast app to discover more snips from this episode
Get the app