Security, Spoken cover image

The Fight Against AI Comes to a Foundational Data Set

Security, Spoken

00:00

The Conflict Surrounding Common Crawl and Copyright Issues

The chapter delves into the escalating conflict between media outlets and AI companies regarding data redactions in Common Crawl, triggering tensions over copyright issues and usage of the dataset for AI training. There is a significant impact on academic research and increased blocking of the web crawler CC Bot by major news and media sites as the dispute unfolds.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app