Business, Spoken cover image

Business, Spoken

Here's Proof You Can Train an AI Model Without Slurping Copyrighted Content

Mar 21, 2024
Exploring training AI models without copyrighted data using public domain text, Fairly Trained certification program for ethical AI, and licensing trends for AI models like VoiceMod and Frostbite Orkings.
07:19

Podcast summary created with Snipd AI

Quick takeaways

  • AI models can be trained ethically without copyrighted data by using public domain text.
  • The Common Corpus dataset provides a large training resource for AI models free from copyright concerns.

Deep dives

AI Training Models Without Copyrighted Data

A group of researchers, supported by the French government, has developed a significant AI training data set composed entirely of public domain text. Fairly Trained, a non-profit organization, has certified a large language model named CLIMM created by a legal tech startup, 273 Ventures, using a curated training data set of legal, financial, and regulatory documents. This approach challenges the common practice of using copyrighted material to train AI models, demonstrating an alternative path to AI development that respects copyright laws and emphasizes ethical data usage.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode