

914: Data Lakes 101 (and Why They’re Key for AI Models), with Oz Katz
91 snips Aug 15, 2025
Oz Katz, Cofounder and CTO of lakeFS, shares his expertise on data lakes, essential for modern AI applications. He highlights the differences between data lakes and data warehouses, emphasizing their roles in managing complex data infrastructures. Katz discusses lakeFS's collaboration with Legofest, the challenges of handling multimodal data, and how version control can enhance team collaboration. He also explores the revolutionary shift towards object storage and the integration of vector databases to improve data accessibility and efficiency.
AI Snips
Chapters
Books
Transcript
Episode notes
Time Zone Travel Anecdote
- Oz Katz mentions he was in San Francisco en route to Brisbane after recording.
- He uses the anecdote about the international date line being nonstraight to illustrate time zone oddities.
Data Lake Is A Shared Central Folder
- A data lake is essentially a shared central folder where many data streams converge and teams collaborate.
- It prioritizes flexibility over rigid structure so teams can move faster with heterogeneous data.
Lakes vs Warehouses: Flexibility Tradeoff
- Data warehouses enforce rigid tabular schemas and gatekeepers, while lakes let you ingest messy data quickly.
- That mess can be valuable because it enables faster exploration and use of diverse sources.