Nature Podcast cover image

Audio long-read: Rise of the robo-writers

Nature Podcast

CHAPTER

Open a I, Google and Others Won't Publish the Code, Model or Training Data.

Researchers have reported that they can extract sensitive data used to train large language models by posing careful questions. They retrieved personal contact information that gpt two had memorized verbatum. The best defence, they write, is simply to limit the sensitive information in the training data. All of these concerns suggest that, at a minimum, researchers should publicly document the training data that goes into their models. Some university teams andfir s, including gogle and facebook, have done this, but others, including invidia, microsoft and open a i, have not.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner