#61: Pirated Books Are Powering Generative AI, the 2023 State of Marketing AI Report, and GPT-3.5 Fine-Tuning Is Here

15 snips

Aug 29, 2023

Dive into the world of generative AI, where pirated books are surprisingly powering language models! Discover insights from the creator of a controversial dataset aimed at leveling the playing field for developers. The conversation shifts to the ethical dilemmas of copyright as marketers face challenges integrating AI tools. Plus, learn about OpenAI's fine-tuning advancements for tailored business solutions and the democratization of coding through innovative AI models. Explore Google's AI content strategies and Elon Musk's game-changing self-driving tech!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Pirated Books Powering LLMs

Pirated books, including a dataset called Books3, have been used to train large language models (LLMs) like Meta's LLaMA.
This practice, now confirmed by investigative journalism, raises significant copyright concerns and has prompted lawsuits from authors.

INSIGHT

Quality Training Data

High-quality training data, like professionally published books, may explain the strong writing abilities of LLMs.
By weighting superior content more heavily than average internet text, models learn to write like the best human writers.

ADVICE

Licensing Content

Licensing high-quality content, including books, is a more ethical and sustainable approach for training future LLMs.
This proactive measure addresses copyright concerns and allows models to learn from the best examples of writing.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Pirated books are powering generative AI

The Atlantic just released a major investigative journalism piece that proves popular large language models, like Meta’s LLaMA, have been using pirated books to train their models—a fact that was previously alleged by multiple authors in multiple lawsuits against AI companies.

The article states, “Upwards of 170,000 books, the majority published in the past 20 years, are in LLaMA’s training data. . . . These books are part of a dataset called “Books3,” and its use has not been limited to LLaMA. Books3 was also used to train Bloomberg’s BloombergGPT, EleutherAI’s GPT-J—a popular open-source model—and likely other generative-AI programs now embedded in websites across the internet.”

According to an interview in the story with the creator of the Books3 dataset of pirated books, it appears Books3 was created with altruistic intentions. Reisner interviewed the independent developer of Books3, Shawn Presser, who said he created the dataset to give independent developers “OpenAI-grade training data,” in fear of large AI companies having a monopoly over generative AI tools.

The 2023 State of Marketing AI Report findings

Marketing AI Institute, in partnership with Drift, just released our third-annual State of Marketing AI Report. The 2023 State of Marketing AI Report contains responses from 900+ marketers on AI understanding, usage, and adoption. In it, we’ve got tons of insights on how marketers understand, use, and buy AI technology, the top outcomes marketers want from AI, the top barriers they face when adopting AI, how the industry feels about AI's impact on jobs and society, who owns AI within companies, and much more. Paul and Mike talk about some of the most interesting findings from the data.

You can now fine-tune GPT-3.5 Turbo

OpenAI just announced a big update: You can now fine-tune GPT-3.5 Turbo to your own use cases. This means you can customize the base GPT-3.5 Turbo model to your own needs, so they perform much better on use cases that may be custom to your organization’s specific needs. For instance, you might fine-tune GPT-3.5 Turbo to better understand text that’s highly specific to your industry or business. You might also fine-tune models to sound more like your brand in their outputs or remember specific examples or preferences when producing outputs, so you don’t have to spend resources and bandwidth on highly complex prompts every time you use a model. Notably, OpenAI says: “Early tests have shown a fine-tuned version of GPT-3.5 Turbo can match, or even outperform, base GPT-4-level capabilities on certain narrow tasks.” They also note fine-tuning for GPT-4 will be coming this fall.

Plus…the rapid-fire topics this week are interesting, so stick around for the full episode.

Listen to the full episode of the podcast: https://www.marketingaiinstitute.com/podcast-showcase

Want to receive our videos faster? SUBSCRIBE to our channel!

Visit our website: https://www.marketingaiinstitute.com

Receive our weekly newsletter: https://www.marketingaiinstitute.com/newsletter-subscription

Looking for content and resources?

Come to our next Marketing AI Conference: www.MAICON.ai

Enroll in AI Academy for Marketers: https://www.marketingaiinstitute.com/academy/home

Join our community:

Slack: https://www.marketingaiinstitute.com/slack-group-form

LinkedIn: https://www.linkedin.com/company/mktgai

Twitter: https://twitter.com/MktgAi

Instagram: https://www.instagram.com/marketing.ai/

Facebook: https://www.facebook.com/marketingAIinstitute