How AI Is Built

#027 Building the database for AI, Multi-modal AI, Multi-modal Storage

5 snips
Oct 23, 2024
Chang She, CEO of Lens and co-creator of the Pandas library, shares insights on building LanceDB for AI data management. He discusses how LanceDB tackles data bottlenecks and speeds up machine learning experiments with unstructured data. The conversation dives into the decision to use Rust for enhanced performance, achieving up to 1,000 times faster results than Parquet. Chang also explores multimodal AI's challenges, future applications of LanceDB in recommendation systems, and the vision for more composable data infrastructures.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Multimodal Embedding Challenges

  • Multimodal embeddings present storage and serving challenges due to varying access patterns.
  • Different storage solutions are often needed for source data, embeddings, and metadata, complicating multimodal data management.
ANECDOTE

Origin of LanceDB

  • Chang She and his co-founder experienced pain points with ML teams managing large-scale unstructured data for multimodal AI at 2BTV and Cruise.
  • This led them to create LanceDB to simplify the data infrastructure for this new era of machine learning.
ANECDOTE

Rust Adoption

  • LanceDB's team initially used C++ but switched to Rust in late 2022 after a hackathon project.
  • Rewriting the C++ code in Rust took only three weeks, demonstrating increased productivity and code safety.
Get the Snipd Podcast app to discover more snips from this episode
Get the app