Improving Text Embeddings with Large Language Models

Book • 2025

Author

Linjun Yang

This paper introduces a novel method for enhancing text embeddings by leveraging large language models and synthetic data.

The approach involves generating diverse synthetic data and fine-tuning open-source language models to achieve high-quality text embeddings without relying on labeled data.

The method demonstrates strong performance on competitive benchmarks and sets new state-of-the-art results when combined with labeled data.

Mentioned by

Mentioned in 1 episodes

Mentioned in relation to improving text embeddings using large language models.

#149 - Reflecting on 2023, Midjourney v6, Anthropic Revenue, Unified-IO 2, NY Times sues OpenAI

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app