
Currents 087: Shivanshu Purohit on Open-Source Generative AI
The Jim Rutt Show
00:00
The Effect of Order of Training on Model Memorization
The order of training actually doesn't affect how the models memorize something. We found out that the model still like memorize a lot of stuff, especially if the data is very sparse. If you don't pre process your text good enough and say for some reason someone's credit card information ended up in your data set, even if it was just one instance the model can more or less verbatim memorize the information. And if you open source it or like give the checkpoint to someone, the other person can just like extract the information if they know what they're doing.
Transcript
Play full episode