Discussion on AI copyright issues with Common Crawl Foundation's Rich Skrenta. Debate on restricting AI access to data for training. Mention of recent court ruling related to web scraping. Exciting device announcement from CES - Rabbit R1 with Perplexity AI integration. Study on actual risk of AI automating jobs away in the near future.
Restricting access to publicly published content by news outlets limits the integrity of Common Crawl's internet archive, impacting the quality of AI training data.
A recent MIT study challenges the notion that AI will completely replace human workers, highlighting the high cost of developing and maintaining AI systems compared to potential savings from automation.
Deep dives
The Rabbit R1: A Dedicated AI Device with Retro Design
The Rabbit R1 is a $199 device designed by Teenage Engineering that aims to be a dedicated AI assistant. It features a unique retro design with a scroll wheel navigation, a 360-degree flippable camera, and a large button to activate AI scripts called rabbits. The device includes a year's worth of access to the perplexity AI language model. While the device itself is intriguing, its real-world usefulness and ability to replace human workers remains uncertain.
MIT Study Suggests Full Job Automation is Economic Infeasible
An MIT CSAIL study challenges the belief that AI will fully replace human workers, stating that it is economically unfeasible in many cases. The research focused on visual monitoring tasks, such as judging food quality, and estimated that only 25% of wages for these tasks would make economic sense to automate. The study highlights the high cost of developing and maintaining AI systems compared to the potential savings from automation. While the impact of AI on job automation remains a concern, premature panic and overestimation of its effects may hinder progress and opportunities.
On the premiere episode of the AI Inside podcast, hosts Jeff Jarvis and Jason Howell discuss AI copyright issues with Common Crawl Foundation's Rich Skrenta regarding news outlets limiting access to content they publish publicly, impacting the integrity of Common Crawl's internet archive. In recent years, the archive has been used by LLMs as AI training data, and the implications of restricting information have a dramatic impact on the data quality that survives.
INTERVIEW
Introduction and background on AI Inside podcast
Discussion of the recent AI oversight Senate hearing Jeff testified at
Introduction of guest Rich Skrenta from Common Crawl Foundation
Overview of Common Crawl and its goals to archive the open web
Discussion of how Common Crawl data is used to train AI models
News publishers wanting content removed from Common Crawl
Debate around copyright, fair use, and AI’s “right to read”
Mechanics of how Common Crawl works and what it archives
Concerns about restricting AI access to data for training
Risk of regulatory capture and only big companies being able to use AI
Discussion of recent court ruling related to web scraping
Hopes for Common Crawl's growth and evolution
NEWS BITES
Interesting device announcement from CES - Rabbit R1 with Perplexity AI integration
Study on actual risk of AI automating jobs away in the near future