
Audio long-read: Rise of the robo-writers
Nature Podcast
Open a I, Google and Others Won't Publish the Code, Model or Training Data.
Researchers have reported that they can extract sensitive data used to train large language models by posing careful questions. They retrieved personal contact information that gpt two had memorized verbatum. The best defence, they write, is simply to limit the sensitive information in the training data. All of these concerns suggest that, at a minimum, researchers should publicly document the training data that goes into their models. Some university teams andfir s, including gogle and facebook, have done this, but others, including invidia, microsoft and open a i, have not.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.