

Gnarly Data Waves by Dremio
Dremio (The Open Data Lakehouse Platform)
Gnarly Data Waves is a weekly show about the world of Data Analytics and Data Architecture. Learn about the technologies giving the company access to cutting-edge insights. If you work datasets, data warehouses, data lakes or data lakehouses, this show it for you!
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Episodes
Mentioned books

Oct 6, 2023 • 1h
EP33 - The Who, What and Why of Data Lakehouse Table Formats (Apache Iceberg, Delta Lake and Apache Hudi)
In the rapidly evolving landscape of big data, Data Lakehouse is heralding a new age of unified analytics, blending the best elements of data lakes and data warehouses. Central to this convergence is the need for advanced table formats that can meet the demands of scalability, performance, and data reliability. This webinar dives deep into the world of Data Lakehouse table formats, specifically focusing on Apache Iceberg, Delta Lake, and Apache Hudi.
Who should watch this video?
Data engineers, data architects, data analysts, and other professionals interested in modernizing their data platform or seeking deeper insights into the technicalities and advantages of these advanced table formats.
Key Takeaways:
- Introduction to Data Lakehouse: Explore the genesis of the Data Lakehouse paradigm, its significance, and how it’s reshaping the way organizations think about big data storage and analytics.
- Demystifying Apache Iceberg, Delta Lake, and Apache Hudi: Understand the intricacies of these popular table formats, their architectural nuances, and how they differ from traditional table structures.
- Features Spotlight: Delve into the unique feature sets that each format brings to the table - from ACID transactions, time-travel queries, to efficient upserts and scalability features.
- The Relevance Quotient: Understand why these table formats matter in today's data-driven world. Learn about their roles in ensuring data consistency, improving query performance, and facilitating near real-time analytics on large datasets.
- Best Practices and Use Cases: Explore real-world scenarios where organizations have leveraged these formats to transform their data analytics operations, and glean best practices for successful implementation and optimization.
Watch this video to uncover the intricate dance of modern table formats that are at the heart of the Data Lakehouse revolution. Equip yourself with the knowledge to harness their power, ensuring a robust and efficient data infrastructure for your organization.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #accelerate #analytics #ELT #dataanalytis #ACIDtransactions #time-travelqueries

Sep 15, 2023 • 49min
EP32 - Introduction to Dremio Arctic: Catalog Versioning and Iceberg Table Optimization
The data lakehouse is an architectural strategy that combines the flexibility and scalability of data lake storage with the data management, data governance, and data analytics capabilities of the data warehouse. As more organizations adopt this architecture, data teams need a way to deliver a consistent, accurate, and performant view of their data for all of their data consumers.
In this video, we will share how Dremio Arctic, a data lakehouse management service:
- Enables easy catalog versioning using data as code, so everyone has access to consistent, accurate, and high quality data.
- Automatically optimizes Apache Iceberg tables, reducing management overhead and storage costs while ensuring high performance on large tables.
- Eliminates the need to manage and maintain multiple copies of the data for development, testing, and production.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #accelerate #analytics #ELT #dataanalytis

Sep 13, 2023 • 37min
EP31 - ELT, ETL and Dremio Data Lakehouse
Watch this video for an insightful discussion titled "ELT, ETL, and the Dremio Data Lakehouse," where we explore the cutting-edge capabilities of Dremio in revolutionizing data engineering and analytics workflows. This webinar delves into the strategic use of Dremio's innovative technologies to optimize Extract, Load, Transform (ETL) and Extract, Load, Transform (ELT) patterns for enhanced efficiency and cost-effectiveness.
The session will commence with an in-depth exploration of traditional ETL and ELT methodologies, highlighting the challenges faced by organizations in managing large-scale data transformations. We will analyze the critical role of ELT patterns in the modern data landscape and the growing significance of data lakes for storage and processing.
Subsequently, we will introduce Dremio, a powerful and flexible data lakehouse platform, as a game-changer for executing ETL and ELT operations. Dremio's unique architecture empowers users to directly query data residing in the data lake, eliminating the need for unnecessary data copies and reducing data movement overhead significantly.
During the webinar, attendees will gain valuable insights into how Dremio's no-copy architecture minimizes data redundancy, accelerates data processing, and drastically reduces the associated costs. By harnessing the full potential of data lake storage, organizations can simplify their data engineering workflows, enhance data availability, and achieve unparalleled performance for analytical workloads.
Key webinar takeaways:
A comprehensive overview of ETL and ELT patterns and their relevance in modern data environments.
- The rise of data lakes and the pivotal role of Dremio's data lakehouse platform in transforming data management paradigms.
- Understanding the benefits of Dremio's no-copy architecture in optimizing data processing and analytics.
- Best practices and practical implementation tips for leveraging Dremio effectively in ETL and ELT workflows.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #accelerate #analytics #ELT #dataanalytis

Aug 29, 2023 • 48min
EP29 - Simplify data governance at scale across all your data
As data volumes grow - and more users across your organization want access to data to accelerate business decision-making - managing data governance is more important than ever. What this video in how to simplify data governance for analytics, and deliver data governance at scale with Dremio.
You will learn:
- Data governance on the data lakehouse
- How to balance data access and control to accelerate analytics
- What does good data governance look like?
- How Dremio supports simplified data governance
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #accelerate #analytics

Aug 22, 2023 • 42min
EP28 - Apache Iceberg Office Hours
Watch the Dremio developer advocacy and engineering teams for an installment of Apache Iceberg Office Hours. During this time we’ll have a brief Iceberg presentation on table format interoperability, going over the table format migration options, converters, and newer interoperability solutions like onetable and uniform. We’ll go through the capabilities, limitations, and considerations and then have lots of dedicated time for Q&A on the presented topic or any other questions or guidance you’re looking for help on in learning about Apache Iceberg or architecting your data lakehouse around Apache Iceberg.
We will cover topics:
- Format Interop
- Using Lakehouse Engines to Unite Table Formats
- Using Onehouse's Onetable Technology
- Using Delta Lake 3.0 UniFormat
- Consideration - Consistency
- Consideration - Vendor Agnosticism
- Consideration - Flexibility
Examples of questions you can ask:
How can I optimize my Iceberg tables for my different use cases?
What tools will best handle my ETL job to write to Iceberg?
How can I control access to my Iceberg tables?
How can I convert data from X into an Iceberg table?
How can I get started with Iceberg in Databricks?
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #officehours #Deltalake #Onehouse #Onetable #vendoragnosticism #flexibility #consistency

Aug 8, 2023 • 50min
EP27 - How Maersk is Building A Next Gen Data Lakehouse with Dremio
Maersk is a global leader in container shipping, logistics, and energy. With an extensive network of offices in 116 countries, over 900 vessels, hundreds of warehouses, and a modern fleet of aircraft. Maersk provides comprehensive shipping services across the globe with commitments to achieve decarbonization and reach net-zero emissions.
Join this live fireside chat with Mark Sear, Director of Data Analytics and AI/ML at Maersk, and Tomer Shiran, founder and chief product officer at Dremio, as they talk about Maersk’s journey in building a next-generation data platform for solution development using Dremio’s open data lakehouse and GenerativeAI.
In this video, you will learn:
- Common data platform challenges in the shipping and logistics industry
- How Maersk uses Dremio’s open data lakehouse to empower their developers and end users to deliver agile and cost-effective solutions
- A live demo of GenerativeAI
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog #generativeai #AI #shipping #logistic #maersk

Jul 27, 2023 • 34min
EP26 - Versioning Data in the Data Lakehouse (File, Table and Catalog Versioning)
Versioning is a technique that has helped software developers to develop many practices that allow them integrate and deploy new code continuously allowing for more rapid development of software. In a world where data is being generated faster than even, the data community needs technology that allows for rapid integration and deployment of new data.
In this video, we’ll discuss:
- 3 Levels of Versioning on the Data Lakehouse (File, Table and Catalog)
- Pros and Cons to each versioning paradigm
- When should you use each?
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #versioning #tables #catalog

Jul 27, 2023 • 38min
EP25 - Building a Data Lakehouse on Azure Data Lake Storage
In the rapidly evolving data landscape, organizations seek to use data assets to drive growth and competitive advantage. The problem is, the rigid warehouse-centric data architecture makes it hard to deliver faster access to data to end users without creating data copies and siloed ETL pipelines. As cloud data lakes grow, the challenge for many organizations will be providing access to that data for exploratory BI and interactive analytics.
In this video, you will learn about building a data lakehouse on Azure Data Lake Storage with product leaders from Microsoft and Dremio:
- The fundamentals of a data lakehouse architecture on Azure
- The need for an open data lakehouse
- Unifying data access on ADLS with Dremio
- A self-service experience with Dremio and Power BI
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #microsoft #azure #dataarchitecture #azuredatalakestorage

Jul 12, 2023 • 50min
EP24 - Simplifying Data Mesh with Dremio's Open Data Lakehouse
As the data mesh paradigm gains adoption across enterprises, it’s hard to ignore the increasing focus on the architectural aspects of this approach, which often overshadows the crucial socio-organizational element. The problem is, it’s hard to implement the concept of data mesh if the technology and organizational aspects are not aligned. Business units need faster access to unified data and data teams want to simplify data architecture.
Watch Nik Acheson, Sr. Director of Product Management and GTM Strategy from Dremio, as he talks about getting started with data mesh and how Dremio’s open data lakehouse brings the concepts of data mesh to life. In this video, you will:
- Understand the core principles of data mesh
- The benefits of data mesh and navigating organizational adoption
- Learn how Dremio’s open data lakehouse simplifies data mesh journey with a 3-part phased approach
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #changedatacapture #AI #dataarchitecture

Jun 28, 2023 • 35min
EP23 - Getting Started With Dremio Data Reflections
For analytical workloads, data teams today have various options to choose from in terms of data warehouses and lakehouse query engines. To enable self-service, they provide a semantic layer for end users, usually with materialized views, BI extracts, or OLAP cubes. The problem is, this process creates data copies and requires end users to understand the underlying physical data model.
Join the Dremio engineering team in this episode of Gnarly Data Waves to learn about accelerating your queries with data reflections. Get answers to business questions faster without the challenges that come with today's approach, such as governing data copies or managing complex aggregate tables and materialized views.
In this video, you will learn:
- The importance of data reflections and how it removes the need for data copies
- When to use raw reflections and aggregate reflections
- Best practices on data reflection refreshes
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #reflections #ML #changedatacapture


