AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Mind the Distribution Shift: Impact on Model Performance
Different retrieval models like embeddings and core work can be highly sensitive to distribution shifts. Many people use the same approach to create datasets and training data, leading to inflated performance gains which may not be sustainable. Companies promoting this approach may be overlooking the sensitivity of models to changes in query profiles. Language models trained on clean data without spelling or grammar mistakes can perform poorly when faced with real user queries that are typically riddled with errors. This mismatch in data quality can lead to models performing worse than their base versions.