Sam, a journalist known for her tech controversies coverage, and Joseph, an internet culture enthusiast, dive into some hot topics. They discuss the uprising against AI scraping, spotlighting how websites are using robots.txt to fight back. Sam unveils her scoop on Runway's unauthorized video scraping from YouTube creators, raising eyebrows about copyright issues. Plus, they explore how Reddit’s exclusive deal with Google shapes search visibility, leaving other search engines in the dust. Lastly, the crew fills Joseph in on the bizarre world of Skibidi.
Website owners are increasingly updating their robots.txt files to block unauthorized AI scrapers, highlighting growing concerns over intellectual property protection.
Google's exclusive permission to scrape Reddit's content raises questions about internet accessibility and the competitive landscape of search engines.
Deep dives
The Growing Backlash Against AI Scraping
A recent study highlights a measurable backlash from website owners against AI scraping technologies. The Data Provenance Initiative analyzed over 14,000 websites and found that around 5% had updated their robots.txt files to block AI scrapers, a notable increase from zero just a year prior. Among the most actively maintained websites, the blocking rate reached 28%, indicating significant concern over the unauthorized use of their content. This trend suggests that website owners are becoming increasingly proactive in safeguarding their intellectual property from AI companies that scrape their data.
Misidentification of AI Scrapers
The findings from the above study also revealed that many websites are unintentionally blocking the incorrect AI scrapers. For example, numerous sites blocked bots associated with the AI company Anthropic, yet these bots were not the ones actively scraping their content. This misidentification arises from the proliferation of AI scrapers, leading website owners to block known bots while misleadingly allowing others that could be bypassing their defenses. Consequently, this confusion has caused many sites to adopt a blunt approach to blocking scrapers, which may inadvertently hinder important crawlers necessary for search indexing and archiving.
Google’s Exclusive Deal with Reddit
In a significant development, only Google now has permission to scrape Reddit’s content, while other search engines like Bing and DuckDuckGo find themselves unable to access any Reddit data. This exclusivity emerged from Reddit's concern over monetizing its user-generated content, leading to a financial agreement with Google. As a result, only Google's search engine can retrieve and display Reddit content, which may severely limit users' options for seeking information through alternative platforms. This situation has ignited debates about the impact on overall internet accessibility and competitiveness.
AI Video Maker's Controversial Data Practices
A leaked document revealed that AI video generator Runway scraped a vast number of YouTube videos without permission, relying on a comprehensive spreadsheet detailing content sources. Among the channels and websites listed were major content creators and brands, which highlighted how Runway’s generative model leverages unauthorized data to create videos styled after those found on YouTube. This practice raises ethical concerns regarding the ownership of content and the rights of individual creators whose work may be used to train AI systems without consent. As a result, this situation underscores the challenges that arise as AI technologies intersect with traditional content creation.
We've got a bumper episode today. First off, a whole series of stories about robots.txt and its increased use against AI, as well as Reddit only being available in Google results now (and not other search engines like DuckDuckGo). Then after the break, Sam discusses her amazing scoop which showed that multibillion dollar company Runway scraped videos from individual YouTube creators. In the subscribers-section, everyone educates Joseph on what Skibidi is.