How AI Is Built  cover image

How AI Is Built

#9 Jorrit Sandbrink on Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack

May 24, 2024
Jorrit Sandbrink, a data engineer, discusses lake house architecture blending data warehouse and lake, key components like Delta Lake and Apache Spark, optimizations with partitioning strategies, and data ingress with DLT. The podcast emphasizes open-source solutions, considerations in choosing tools, and the evolving data landscape.
27:53

Podcast summary created with Snipd AI

Quick takeaways

  • Lake houses offer a powerful and flexible architecture for modern data analytics.
  • Open-source solutions provide cost-effective and customizable alternatives.

Deep dives

Lake House Architecture and Technology Choices

Decoupling storage and compute in a lake house architecture allows for various choices starting with selecting storage location, often between the cloud or on-premise. Popular table formats like Delta Lake, Iceberg, and Apache Hoodie, with a religious debate among users, provide metadata layers on Parquet file formats for efficient data management. The importance of choosing a table format like Delta Lake with Parquet as a unified file format is highlighted.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner