This chapter discusses two cases involving giant tech companies and the use of copyrighted material without permission, including Google's attempt to create a digital library without author permission and the alleged theft at the heart of ChatGPT by OpenAI. It explores the concept of fair use in copyright law, the complexity of fair use in AI-generated content, and the potential consequences of legal battles between tech companies and creators.
When best-selling thriller writer Douglas Preston began playing around with OpenAI's new chatbot, ChatGPT, he was, at first, impressed. But then he realized how much in-depth knowledge GPT had of the books he had written. When prompted, it supplied detailed plot summaries and descriptions of even minor characters. He was convinced it could only pull that off if it had read his books.
Large language models, the kind of artificial intelligence underlying programs like ChatGPT, do not come into the world fully formed. They first have to be trained on incredibly large amounts of text. Douglas Preston, and 16 other authors, including George R.R. Martin, Jodi Piccoult, and Jonathan Franzen, were convinced that their novels had been used to train GPT without their permission. So, in September, they sued OpenAI for copyright infringement.
This sort of thing seems to be happening a lot lately–one giant tech company or another "moves fast and breaks things," exploring the edges of what might or might not be allowed without first asking permission. On today's show, we try to make sense of what OpenAI allegedly did by training its AI on massive amounts of copyrighted material. Was that good? Was it bad? Was it legal?
Help support Planet Money and get bonus episodes by subscribing to Planet Money+ in Apple Podcasts or at plus.npr.org/planetmoney.Learn more about sponsor message choices:
podcastchoices.com/adchoicesNPR Privacy Policy