Tengyu Ma, Co-Founder of Voyage AI and Assistant Professor at Stanford University, discusses embedding model training, contrastive learning theory, fine-tuning models for Langchain documentation, challenges in serving embeddings API, and optimizations for query inference and batch embeddings on the Weaviate Podcast.
Voyage AI focuses on embedding models for horizontal scalability in enterprise AI.
Contrastive learning in text and image embeddings captures similarities and differences effectively.
Optimizing AI models involves tuning hyperparameters, understanding scaling laws, and continual fine-tuning for performance improvement.
Deep dives
Motivation for Starting Voyage AI and Enterprise AI Focus
The co-founder of Voyage AI, Tenu mod, discusses the motivation behind starting Voyage AI in early 2023. He emphasizes the focus on contributing to the commercialization of AI, particularly in the direction of enterprise AI. Tenu explains the shift towards focusing on components like embedding models to ensure horizontal scalability across different domains and industries.
Contrastive Learning for Text and Images
Tenu delves into contrastive learning concepts, particularly in text and image embeddings. He explains how embedding models convert documents or images into vectors with contrastive loss functions aiming to capture similarities and differences. Tenu details the process of incentivizing similar representations for related image pairs and different representations for random pairs. He discusses the application of contrastive learning for text embeddings, emphasizing the challenges in defining similarity for text data.
Enhancing Embedding Models and Multi-Vector Representations
The discussion explores advancements in embedding models, including the concept of multi-vector representations and scalable architectures. Tenu highlights the benefits and complexities of using multiple vectors per token in embedding models like in Cobalt. He addresses the challenges of maintaining efficiency and quality vis-a-vis data curation, architecture optimization, and hyperparameter tuning. While neural architecture search remains a potential tool, the focus is currently on data curation and refining existing architectures for optimal performance.
The Importance of Hyperparameter Tuning and Scaling Laws in AI Models
Tuning hyperparameters and understanding scaling laws play crucial roles in optimizing AI models. Hyperparameters like activation functions and model sizes impact model performance. Balancing compute resources with model accuracy is challenging, as smaller models may not transfer learnings to larger ones efficiently. Scaling laws indicate how model size affects performance, with embedding models showing improved performance with larger sizes within latency constraints.
The Significance of Continual Fine-Tuning and Heterogeneity in Serving Embedding Models
Continual fine-tuning of embedding models on evolving data sets enhances performance. Customizing TPM and RPM for diverse user needs in embedding model APIs is essential. Adapting backend processes to optimize latency and throughput based on batch sizes and user sensitivity to response times is crucial. Scaling GPU usage and balancing latency needs with throughput requirements are key considerations in serving embedding models effectively.
Voyage AI is the newest giant in the embedding, reranking, and search model game!
I am SUPER excited to publish our latest Weaviate podcast with Tengyu Ma, Co-Founder of Voyage AI and Assistant Professor at Stanford University!
We began the podcast with a deep dive into everything embedding model training and contrastive learning theory. Tengyu delivered a masterclass in everything from scaling laws to multi-vector representations, neural architectures, representation collapse, data augmentation, semantic similarity, and more! I am beyond impressed with Tengyu's extensive knowledge and explanations of all these topics.
The next chapter dives into a case study Voyage AI did fine-tuning an embedding model for the LangChain documentation. This is an absolutely fascinating example of the role of continual fine-tuning with very new concepts (for example, very few people were talking about chaining together LLM calls 2 years ago), as well as the data efficiency advances in fine-tuning.
We concluded by discussing ML systems challenges in serving an embeddings API. Particularly the challenge of detecting if a request is for batch or query inference and the optimizations that go into either say ~100ms latency for a query embedding or maximizing throughput for batch embeddings.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode