Open a I, Google and Others Won't Publish the Code, Model or Training Data.

Researchers have reported that they can extract sensitive data used to train large language models by posing careful questions. They retrieved personal contact information that gpt two had memorized verbatum. The best defence, they write, is simply to limit the sensitive information in the training data. All of these concerns suggest that, at a minimum, researchers should publicly document the training data that goes into their models. Some university teams andfir s, including gogle and facebook, have done this, but others, including invidia, microsoft and open a i, have not.

Play episode from 17:12

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app