

Gnarly Data Waves by Dremio
Dremio (The Open Data Lakehouse Platform)
Gnarly Data Waves is a weekly show about the world of Data Analytics and Data Architecture. Learn about the technologies giving the company access to cutting-edge insights. If you work datasets, data warehouses, data lakes or data lakehouses, this show it for you!
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Episodes
Mentioned books

Jun 21, 2023 • 58min
EP22 - Dremio and Data Lakehouse Table Formats (Apache Iceberg, Delta Lake and Apache Hudi & Dremio)
In the search to implement a data lakehouse many have been adopting one of the three major data lakehouse table formats. In this video, you’ll learn about how the different formats can be used with Dremio’s lakehouse platform.
- What is a Table Format
- What is Iceberg, Delta Lake and Hudi
- Reading with Dremio
- Using Multiple Formats with Dremio
- Accelerating Queries with Dremio
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #apachespark #ML #changedatacapture

Jun 16, 2023 • 1h
EP21 - Data as Code with Dremio Arctic: ML Experimentation & Reproducibility on the Lakehouse
As more data consumers require access to critical customer and operational data in the data lake, data teams need solutions that enable multiple users to leverage the same view of the data for a wide range of use cases without impacting each other. In this video of Gnarly Data Waves, we will discuss how the data as code capabilities in Dremio Arctic enable data scientists to:
- Create a data science branch of the production branch for experimentation without creating expensive data copies or impacting production workloads
- Easily work and collaborate cross-functionally with other data consumers and line of business experts
- Quickly reproduce models and results by returning to previous branch states with tags and commit history
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #projectnessie #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool #apachespark #ML #changedatacapture

Jun 7, 2023 • 54min
EP20 - What's New in the Apache Iceberg Project: Updates, PyIceberg, Compute Engines
The Apache Iceberg project has made tremendous strides, evolving on various fronts such as usage, ecosystem adoption, community growth, and capabilities. In the past few months, the project has introduced many exciting new features and performance improvements around the core library, compute engines and standalone libraries (such as PyIceberg) that makes this lakehouse technology robust & valuable for organizations. In this video of Gnarly Data Waves, we will go over some of the notable new capabilities of Apache Iceberg.
Specifically, we will discuss about:
- Version 1.2.0 release
- Features such as : Branching/Tagging, New write-distribution-mode, Change Data Capture, Catalog Migrator Tool, Delta to Iceberg migration
- PyIceberg (What’s happening in the Python library)
- Compute Engine-specific features: Dremio, Apache Spark, Flink
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #nessie #sonar #dremiosonar #optimization #automaticdata #scalability #enterprisedata #federated #catalogmigratortool # pylceberg #apachespark #flink #changedatacapture

Jun 1, 2023 • 40min
EP19 - Data Mesh In Practice: Accelerating Cancer Research with Dremio's Data Lakehouse
Memorial Sloan Kettering Cancer Center (MSK) is the largest private cancer center in the world and has devoted more than 135 years to exceptional patient care, innovative research, and outstanding educational programs. Today, MSK is one of 52 National Cancer Institutes designated as Comprehensive Cancer Centers, with state-of-the-art science flourishing side by side with clinical studies and treatment
Join Arfath Pasha, Sr. Engineer at Memorial Sloan Kettering, as he shares his data mesh experience building a scientific data and compute infrastructure for accelerating cancer research. In this episode, you will learn:
Use cases for creating a central data lake for all enterprise data
How Dremio’s data lakehouse enables data mesh
Best practices for making data easier to discover, understand, and trust for data consumers
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #tableformat #ApacheArrow #nessie #sonar #dremiosonar #optimization #automaticdata #scalability #MSK #enterprisedata #federated

May 24, 2023 • 38min
EP18 - Best Practices for Modernizing Your Hadoop Workloads to AWS with Dremio
Many organizations turned to HDFS to address the challenge of storing growing volumes of semi-structured and unstructured data. However, Hadoop never managed to replace the data warehouse for enterprise-grade Business Intelligence and Reporting, and most teams ended up with separate monolithic architectures including data lakes and data warehouses, with siloed data and analytic workloads That is why data teams are increasingly considering a data lakehouse architecture that combines the flexibility and scalability of data lake storage with the data management, data governance, and enterprise-grade analytic performance of the data warehouse. In this episode, Jorge A. Lopez, Product Specialist for Analytics at AWS, and Dremio's Jeremiah Morrow will discuss best practices for modernizing analytic workloads from Hadoop to an open data lakehouse architecture, including:
- Choosing the right storage solution for your data lakehouse, and what features and functionality, such as performance, scalability reliabilty, and more, you should be evaluating.
- Specific steps and best practices for gradually shifting on-premises workloads to a cloud data lakehouse while ensuring business continuity.
- Consolidating data silos to achieve a complete view of your customer and operational data before, during, and after migration.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #selfservice #compliance #dataascode #branches #tags #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #sorting #tableformat #metastore #ApacheArrow #nessie #sonar #dremiosonar #optimization #automaticdata #aws #scalability

May 18, 2023 • 44min
EP17 - Unified Access for Your Data Mesh Self Service Data with Dremio's Semantic Layer
Data silos and a lack of collaboration between teams have been long-standing challenges in data management. This is where data mesh comes into play as an architectural and organizational paradigm, providing a solution by enabling decentralized teams to work collaboratively and share data in a governed manner across the enterprise.
Dremio’s semantic layer provides a particular useful tool for achieving both of these needs and in this video we will discuss:
- The needs of a data mesh (Data Products, Computational Governance, Self-Service)
- The open and decentralized nature of the Dremio Open Data Lakehouse
- How data products can be created and shared with Dremio’s semantic layer
- How governance can be architected centrally using fine-grained access rules
- How to unify your data products across the enterprise
- How the Dremio to Dremio connector enables sharing between domains
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #sorting #tableformat #metastore #ApacheArrow #nessie #sonar #dremiosonar #optimization #automaticdata #management

May 10, 2023 • 32min
EP16 - Easy Data Lakehouse Management with Dremio Arctic’s Automatic Data Optimization
While cloud data lakes address the need to efficiently store large volumes of structured, semi-structured, and unstructured data, they have traditionally lacked the data management and data governance capabilities that have tied enterprise data teams to data warehouse architectures. In this video, learn how Dremio Arctic, a lakehouse management service, delivers automatic data optimization features that simplify data management and enable high-performance analytics directly on data in the data lake. We'll cover:
- The open data lakehouse architecture, and the importance of a lakehouse management service like Dremio Arctic.
- Dremio Arctic's data optimization capabilities.
- How these features ensure high performance analytics and optimal storage footprint while reducing the management burden for data teams.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioarctic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #sorting #tableformat #metastore #ApacheArrow #nessie #sonar #dremiosonar #optimization #automaticdata #management

May 3, 2023 • 1h 1min
EP15 - Getting Started with Dremio’s Data Lakehouse
As organizations strive to provide value faster to end users, data silos makes it difficult to provide insights on time. Learn how Dremio’s data lakehouse accelerates data delivery and discovery, without copies.
In this video, you will get:
- The fundamentals of the data lakehouse with Dremio and Apache Iceberg
- Proven use cases for unifying data access on the lakehouse
- Customer success stories
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #data #analytics #datawarehouse #datalake #dataengineers #dataarchitects #governance #infrastructure #dremiocloud #dremiotestdrive #openlakehouse #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #sorting #tableformat #metastore #ApacheArrow #nessie #dremioarctic

May 1, 2023 • 43min
EP14 - Enabling Data Mesh with Dremio Arctic and Data as Code
Many organizations are moving to a data mesh, a decentralized approach to data architecture that emphasizes domain ownership of data products. Data as code is the practice of managing data the same way software developers manage code in application development, and in a data mesh architecture it can simplify and accelerate the process of building, managing, and sharing data products. In this video, you'll learn:
- Why businesses adopt a data mesh strategy, and key components of a data mesh architecture.
- How data as code enables domain owners to build, manage, and share data products.
- How Dremio Arctic delivers data as code functionality so domain owners can ship data products as easily as developers ship software products.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #clustering #metrics #filtering #partitioning #sorting #tableformat #metastore #ApacheArrow #nessie #dremioarctic

Apr 25, 2023 • 54min
EP13 - Making the Move: Five Factors to Consider When Migrating from Hadoop to the Data Lakehouse
Most users of the Hadoop platform are fed up with its high cost of operational overhead and poor performance. With innovations around open source standards, like Apache Iceberg and Arrow, the data lakehouse has emerged as the destination for companies migrating off Hadoop.
In this video of Gnarly Data Waves, you will learn about:
- 5 key factors to consider as you migrate off Hadoop to the Data Lakehouse
- Why Apache Iceberg replaced Hive metastore
- Creating a unified access layer on your data lakehouse with Dremio
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #migration #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #zorder #clustering #metrics #filtering #partitioning #sorting #tableformat #hive #hadoop #metastore #ApacheArrow #treehive #donaldfarmer


