Nature Podcast cover image

Audio long-read: Rise of the robo-writers

Nature Podcast

00:00

Open a I, Google and Others Won't Publish the Code, Model or Training Data.

Researchers have reported that they can extract sensitive data used to train large language models by posing careful questions. They retrieved personal contact information that gpt two had memorized verbatum. The best defence, they write, is simply to limit the sensitive information in the training data. All of these concerns suggest that, at a minimum, researchers should publicly document the training data that goes into their models. Some university teams andfir s, including gogle and facebook, have done this, but others, including invidia, microsoft and open a i, have not.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app