AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is There a Difference Between Pre-Training Data and Pre-Teaching Data?
Another paper identified out of, you have Goldberg's group. This had a similar sort of intuition for like trying to look at things in the data and trying to figure out why the model is has certain biases or has certain errors. So if you say where was Barack Obama born, the model tends to say Chicago, or in some sense it can say Washington and you know, depending on how you phrase it. I think to be able to answer this question, you have to go back to the pre-training data and try to see like what did it even see?