Data Engineering Podcast cover image

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

NOTE

Developments in Data Warehousing and Open Data Lakes

The data warehousing space is evolving in two main directions, particularly in open data lakes. One trend is the use of DBT for step-by-step transformations, while another is the adoption of materialized views for queries. Materialized views provide real-time data changes when querying, unlike pipeline data which affects only future data based on query or pipeline changes. In terms of open data lakes versus proprietary ones, a significant difference lies in not storing all data in proprietary data lakes due to cost and complexity. Data is often stored in S3 due to cost-effectiveness and ease of backups, compared to proprietary platforms like Snowflake or Redshift, which are deemed costly and cumbersome for storing all data. The split in the open community between differing data warehousing approaches, and the choice between open data lakes and proprietary systems, highlights the varying considerations for data storage and query optimization.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner