Weaviate Podcast cover image

Weaviate Podcast

Arctic Embed with Luke Merrick, Puxuan Yu, and Charles Pierse - Weaviate Podcast #110!

Dec 18, 2024
Join Luke Merrick from Snowflake, a key player in Arctic Embed development, and Charles Pierse, head of Weaviate Labs, as they dive into the intricacies of multilingual text embeddings. They explore the evolution of Arctic Embed 2.0, emphasizing its open-source nature. The conversation covers technical strategies in model training, the economics of pre-training large models, and the challenges of integrating negative examples. They discuss the delicate balance between model simplicity and nuance in retrieval, promoting collaboration to enhance search quality.
01:33:39

Podcast summary created with Snipd AI

Quick takeaways

  • The Arctic Embed series, stemming from Snowflake's acquisition of Neva, emphasizes the pivotal role of embedding models in enhancing search quality.
  • A user-centric approach in selecting embedding models fosters community trust, balancing performance and parameter count to optimize various applications.

Deep dives

Introduction to Arctic Embed Models

The Arctic Embed text embedding model series originated from Snowflake after the acquisition of Neva, a search company. This led to the development of Cortex Search for managed search solutions within Snowflake. Early experiments revealed that embedding models had the most significant impact on search quality, leading to the realization that focusing on these models was crucial. Subsequently, the team recognized the potential of open-source models, aiming to create a trusted community resource while still providing a premium managed service.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner