AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Open a I, Google and Others Won't Publish the Code, Model or Training Data.
Researchers have reported that they can extract sensitive data used to train large language models by posing careful questions. They retrieved personal contact information that gpt two had memorized verbatum. The best defence, they write, is simply to limit the sensitive information in the training data. All of these concerns suggest that, at a minimum, researchers should publicly document the training data that goes into their models. Some university teams andfir s, including gogle and facebook, have done this, but others, including invidia, microsoft and open a i, have not.