Weaviate Podcast cover image

Matryoshka Embeddings with Aditya Kusupati, Zach Nussbaum, and Zain Hasan - Weaviate Podcast #89!

Weaviate Podcast

00:00

Investing Time in Data Preparation for Model Training

Investing significant time in data work and cleaning is crucial for training embedding models. The process involves two stages: large-scale contrastive pre-training with around 240 million pairs of semantically related sentences, and smaller scale contrastive fine-tuning including hard negatives to enhance retrieval performance. The addition of hard negatives aids in pushing model performance further.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app