[Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case

6 snips

Mar 15, 2024

Guest

Frank Liu

Engineers from Zilliz discuss the importance of purpose-built vector databases for AI applications. They cover challenges with large language models and solutions for efficient retrieval tasks. The podcast also explores upcoming features in Millvis two four, including hybrid search capabilities and data management strategies in vector databases.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Exploring the Role of Purpose-built Vector Databases in Various Use Cases

01:37 • 6min

Challenges and Solutions in maximizing Large Language Models for Retrieval Tasks

07:20 • 28min

Hybrid and Multi-Vector Search Features in Millvis two four

35:14 • 15min

Discussing Rerun Car Model, Late Interaction Model, and Embedding Models for Search Refinement

49:54 • 3min

Discussions on Multi-vector Search, Upcoming Features, and Data Management in Vector Databases

52:52 • 6min

Frank Liu is the Director of Operations & ML Architect at Zilliz, where he serves as a maintainer for the Towhee open-source project. Jiang Chen is the Head of AI Platform and Ecosystem at Zilliz. Yujian Tang is a developer advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. MLOps Coffee Sessions Special episode with Zilliz, Why Purpose-built Vector Databases Matter for Your Use Case, fueled by our Premium Brand Partner, Zilliz. Engineering deep-dive into the world of purpose-built databases optimized for vector data. In this live session, we explore why non-purpose-built databases fall short in handling vector data effectively and discuss real-world use cases demonstrating the transformative potential of purpose-built solutions. Whether you're a developer, data scientist, or database enthusiast, this virtual roundtable offers valuable insights into harnessing the full potential of vector data for your projects. // Bio Jiang Chen Frank Liu is Head of AI & ML at Zilliz, with over eight years of industry experience in machine learning and hardware engineering. Before joining Zilliz, Frank co-founded Orion Innovations, an IoT startup based in Shanghai, and worked as an ML Software Engineer at Yahoo in San Francisco. He presents at major industry events like the Open Source Summit and writes tech content for leading publications such as Towards Data Science and DZone. His passion for ML extends beyond the workplace; in his free time, he trains ML models and experiments with unique architectures. Frank holds MS and BS degrees in Electrical Engineering from Stanford University. Frank Liu Jiang Chen is the Head of AI Platform and Ecosystem at Zilliz. With years of experience in data infrastructures and information retrieval, Jiang previously served as a tech lead and product manager for Search Indexing at Google. Jiang holds a Master's degree in Computer Science from the University of Michigan, Ann Arbor. Yujian Tang Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://zilliz.com/ Neural Priming for Sample-Efficient Adaptation: https://arxiv.org/abs/2306.10191LIMA: Less Is More for Alignment: https://arxiv.org/abs/2305.11206ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT: https://arxiv.org/abs/2004.12832 Milvus Vector Database by Zilliz: https://zilliz.com/what-is-milvus --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Timestamps: [00:00] Demetrios' musical intro [04:36] Vector Databases vs. LLMs [07:51] Relevance Over Speed [12:55] Pipelines [16:19] Vector Databases Integration Benefits [26:42] Database Diversity Market [27:38] Milus vs. Pinecone [30:22] Vector DB for Training & Deployment [34:32] Future proof of AI applications [45:16] Data Size and Quality [48:53] ColBERT Model [54:25] Vector Data Consistency Best Practices [57:24] Wrap up