
The Analytics Engineering Podcast Building a multimodal lakehouse for AI (w/ Chang She)
32 snips
Nov 23, 2025 In this discussion, Chang She, co-creator of pandas and CEO of LanceDB, dives into the future of AI data infrastructure. He shares his journey from finance to tech and the challenges faced in constructing a multimodal lakehouse for AI. Chang explains the limitations of Parquet for AI workloads and introduces the innovative Lance file format. He emphasizes the need for unified data retrieval systems to handle diverse, increasingly complex data types driven by AI and agents, paving the way for a seamless data experience.
AI Snips
Chapters
Transcript
Episode notes
Pandas Origin Story
- Chang She co-authored pandas after solving analytics pain at a hedge fund and working with Wes McKinney.
- He named pandas from "panel data" used in econometrics at AQR where they both worked.
Vectors As A Universal Model Language
- Vectors are high-dimensional numeric representations that let models understand diverse data types in a consistent way.
- Chang She argues vectors enable semantic similarity and unify multimodal data for AI models.
Vector Search vs Text Search
- Vector search finds semantically similar items by proximity in high-dimensional space rather than keyword matches.
- Chang She contrasts this with full-text search, which matches syntactic tokens but misses semantic neighbors.

