

Debunking myths of vector search and LLMs with Leo Boytsov
13 snips Jan 17, 2025
In this intriguing conversation, Leo Boytsov, a Senior Research Scientist at AWS AI Labs and expert in vector search algorithms, shares enlightening insights from the cutting edge of search technology. He discusses the evolution of retrieval algorithms, challenges with large document handling, and how non-metric spaces can enhance similarity representation. Leo also reveals the potential of combining traditional and modern search methodologies, and the serendipitous discoveries shaping new industries in AI. A must-listen for tech enthusiasts!
AI Snips
Chapters
Transcript
Episode notes
Leo's Career Journey
- Leo Boytsov's career began in finance but shifted to his passion, retrieval algorithms.
- He worked at Yandex, PubMed, obtained a PhD at CMU, and now researches at AWS, focusing on question answering.
Sparse vs. Dense Vectors
- Combining sparse and dense vector representations can improve retrieval quality.
- However, dense vectors have limitations with long documents and diverse vocabularies.
Limitations of SPLADE
- SPLADE uses subword tokenization, creating a fixed-size vector representation.
- This approach limits its ability to handle rare terms and long documents effectively.