How AI Is Built  cover image

#22 Nils Reimers on the Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) | Search

How AI Is Built

NOTE

Train for Your Domain to Succeed

Text embedding models exhibit significant limitations when applied outside their training domains, leading to poor performance on unfamiliar data. These models excel in tasks similar to their training sets but struggle in diverse environments like community forums, news articles, or scientific papers, often reverting to lexical search methods that are more effective in these contexts. Researchers stress the importance of out-of-domain performance over in-domain training results, as most users operate without access to specific training data. This performance gap is particularly evident when dealing with niche or lesser-known subjects, which the model may not recognize, complicating the retrieval of accurate information. Overall, a critical understanding of the training context of embedding models is crucial for effectively utilizing them in varied applications.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode