

#027 Building the database for AI, Multi-modal AI, Multi-modal Storage
5 snips Oct 23, 2024
Chang She, CEO of Lens and co-creator of the Pandas library, shares insights on building LanceDB for AI data management. He discusses how LanceDB tackles data bottlenecks and speeds up machine learning experiments with unstructured data. The conversation dives into the decision to use Rust for enhanced performance, achieving up to 1,000 times faster results than Parquet. Chang also explores multimodal AI's challenges, future applications of LanceDB in recommendation systems, and the vision for more composable data infrastructures.
AI Snips
Chapters
Transcript
Episode notes
Multimodal Embedding Challenges
- Multimodal embeddings present storage and serving challenges due to varying access patterns.
- Different storage solutions are often needed for source data, embeddings, and metadata, complicating multimodal data management.
Origin of LanceDB
- Chang She and his co-founder experienced pain points with ML teams managing large-scale unstructured data for multimodal AI at 2BTV and Cruise.
- This led them to create LanceDB to simplify the data infrastructure for this new era of machine learning.
Rust Adoption
- LanceDB's team initially used C++ but switched to Rust in late 2022 after a hackathon project.
- Rewriting the C++ code in Rust took only three weeks, demonstrating increased productivity and code safety.