
AI Breakdown arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?
In this episode, we discuss Is Cosine-Similarity of Embeddings Really About Similarity? by Harald Steck, Chaitanya Ekanadham, Nathan Kallus. The paper investigates the use of cosine-similarity in quantifying semantic similarity between embedded vectors in high-dimensional space, and reveals potential issues when applied to embeddings from regularized linear models. Analytical study of these models shows that cosine-similarity can produce meaningless or non-unique similarity measures, with the effects of regularization often implicitly influencing the results. The authors warn against the uncritical use of cosine-similarity in deep learning models due to these findings and suggest considering alternative methods to ensure the validity and clarity of semantic similarity assessments.
