Data Engineering Podcast cover image

Reflections On Designing A Data Platform From Scratch

Data Engineering Podcast

00:00

Data Lakes - The Data Orchestration Layer

The data orchestration piece is becoming one of the most important choices that you make, because it becomes the control center of your entire data platform. And so given my choice to use a data lake storage approach, where i'm using s three as the storage location for my files, i am planning on using presto or trino as the sequel interface. The other aspect of the storage story when working with data lakes is that you need to think about what is the actual format of that data? So am i just storing it as new lined to limited jason? Am i storing it as binary blobs for any relational data? I'm focused on using parquet because it's a very well

Play episode from 19:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app