Ethical Dilemmas in AI Data Sourcing

21min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

The chapter delves into the ethical questions surrounding AI companies' data sourcing practices, focusing on controversial decisions by tech giants to use copyrighted material for training AI models. It highlights the legal and ethical implications of scraping YouTube videos and utilizing copyrighted content, sparking a debate on fair use and copyright violations. The conversation explores the industry's reliance on data acquisition, the potential limitations of data scarcity on AI development, and the emergence of synthetic data as a solution to overcome data limitations.

This week, the companies building artificial intelligence are facing a limit to what training data is publicly available on the internet. Will that stop them from building God? Then, a new bipartisan national privacy law proposal just dropped. We ask what’s in it. And finally, ByteDance is building new apps instead of fighting Congress’s TikTok ban.

Today’s Guests:

Trevor Hughes, president and C.E.O. of the International Association of Privacy Professionals

Additional Reading:

How Tech Giants Cut Corners to Harvest Data for A.I.

For Data-Guzzling A.I. Companies, the Internet Is Too Small

Lawmakers unveil sprawling plan to expand online privacy protections

TikTok Turns to Nuns, Veterans and Ranchers in Marketing Blitz

We want to hear from you. Email us at hardfork@nytimes.com.
Find “Hard Fork” on YouTube and TikTok.