Mind the Distribution Shift: Impact on Model Performance

Different retrieval models like embeddings and core work can be highly sensitive to distribution shifts. Many people use the same approach to create datasets and training data, leading to inflated performance gains which may not be sustainable. Companies promoting this approach may be overlooking the sensitivity of models to changes in query profiles. Language models trained on clean data without spelling or grammar mistakes can perform poorly when faced with real user queries that are typically riddled with errors. This mismatch in data quality can lead to models performing worse than their base versions.

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.

Get the app