#027 Building the database for AI, Multi-modal AI, Multi-modal Storage

10 snips

Oct 23, 2024

Chang She, CEO of Lens and co-creator of the Pandas library, shares insights on building LanceDB for AI data management. He discusses how LanceDB tackles data bottlenecks and speeds up machine learning experiments with unstructured data. The conversation dives into the decision to use Rust for enhanced performance, achieving up to 1,000 times faster results than Parquet. Chang also explores multimodal AI's challenges, future applications of LanceDB in recommendation systems, and the vision for more composable data infrastructures.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Multimodal Embedding Challenges

Multimodal embeddings present storage and serving challenges due to varying access patterns.
Different storage solutions are often needed for source data, embeddings, and metadata, complicating multimodal data management.

ANECDOTE

Origin of LanceDB

Chang She and his co-founder experienced pain points with ML teams managing large-scale unstructured data for multimodal AI at 2BTV and Cruise.
This led them to create LanceDB to simplify the data infrastructure for this new era of machine learning.

ANECDOTE

Rust Adoption

LanceDB's team initially used C++ but switched to Rust in late 2022 after a hackathon project.
Rewriting the C++ code in Rust took only three weeks, demonstrating increased productivity and code safety.

Get the Snipd Podcast app to discover more snips from this episode

Get the app