

Gnarly Data Waves by Dremio
Dremio (The Open Data Lakehouse Platform)
Gnarly Data Waves is a weekly show about the world of Data Analytics and Data Architecture. Learn about the technologies giving the company access to cutting-edge insights. If you work datasets, data warehouses, data lakes or data lakehouses, this show it for you!
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Join us for our live recordings to participate in the Q&A:
dremio.com/events
Subscribe to the Dremio youtube channel on:
youtube.com/dremio
Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/
Episodes
Mentioned books

Apr 12, 2023 • 1h 4min
EP12 - How to Modernize Hive to the Data Lakehouse with Dremio and Apache Iceberg
Learn how to modernize Hive to the Data Lakehouse using Dremio and Apache Iceberg. Explore in-place migration, shadow migration, and moving tables between catalogs. Discover the benefits of Apache Iceberg, different catalog options, and optimistic concurrency. Dive into automating compaction and optimizing tables, pricing and compatibility with Dremio, Kafka, Nessie, and Iceberg. Also, get insights on using AWS Glue catalog with Iceberg and the upcoming Nessie connector for Dreamio Arctic catalog.

Apr 5, 2023 • 36min
EP11 - Apache Iceberg Office Hours - Apache Iceberg 1.2.0 has been released
Listen to Dremio's developer advocacy and engineering teams for an installment of Apache Iceberg Office Hours. During this time we’ll have a brief Iceberg presentation on Hidden Partitioning and Partitioning transforms in Iceberg and then lots of dedicated time for Q&A on the presented topic or any other questions or guidance you’re looking for help on in learning about Apache Iceberg or architecting your data lakehouse around Apache Iceberg.
Questions being asked:
How can I optimize my Iceberg tables for my different use cases?
What tools will best handle my ETL job to write to Iceberg?
How can I control access to my Iceberg tables?
How can I convert data from X into an Iceberg table?
How can I get started with Iceberg in Databricks?
#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #tableau #bestpractices #dashboards #partitionevolution #metadata #icebergpartitioning #partitiontransforms #officehours

Mar 31, 2023 • 42min
EP10 - Optimizing Data Files in Apache Iceberg Performance Strategies
Querying 100s of petabytes of data demands optimized query speed specifically when data accumulates over time. We have to ensure that the queries remain efficient because over time you may end up with a lot of small files and your data might not be optimally organized.
In this video, Dipankar will cover:
Apache Iceberg table format
Problems in the data lake: small files, unorganized files
Techniques such as: partitioning, compaction, metrics filtering
Overlapping metrics problem
Solving it using sorting, Z-order clustering
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN
#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #zorder #clustering #metrics #filtering #partitioning #sorting #tableformat

Mar 22, 2023 • 42min
EP9 - Build your open data lakehouse Iceberg with Fivetran and Dremio
The data lakehouse is quickly emerging as the ideal data architecture because it combines the flexibility and scalability of data lakes with the data management, data governance, and data analytics capabilities of data warehouses. Table formats bring many of the “house” features to the data lakehouse. Apache Iceberg is a truly open table format that is built for easy management and high performance analytics on the largest data volumes in the world.
In this video, we’ll discuss:
- Why open table formats are fundamental to building a data lakehouse
- How Fivetran automates data movement and helps organizations easily move data from various sources to their Amazon S3 data lake in Apache Iceberg tables.
- How Dremio & Fivetran simplify your data lakehouse architecture while providing high performance and ease of use.
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #fivetran #automates #datamovement #amazons3

Mar 15, 2023 • 49min
EP8 - Managing your data as code with Dremio Arctic
As data lakes become the primary destination for growing volumes of customer and operational data, data teams need tools and processes that ensure data quality and consistency across data consumers and use cases. Join Dremio’s Jeremiah Morrow and Alex Merced as they discuss the emergence of data as code for data management, its benefits for data teams, and how Dremio customers are using it to deliver access to a consistent and accurate view of data in their data lakes.
In this video on Gnarly Data Waves - Managing your data as code with Dremio Arctic, you will learn about:
- Why data as code is necessary for ensuring consistency and data quality for large data lakes.
- How Dremio Arctic uses Git-like concepts such as branches, tags, and commits to make data management easy.
- Some high value use cases for data as code.
See all upcoming episodes: https://www.dremio.com/gnarly-data-waves/?utm_medium=social-free&utm_source=youtube&utm_term=GDWEP8&utm_content=gdw-OD&utm_campaign=gdw-EP8
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN
#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #artic #dataascode #branches #tags

Feb 22, 2023 • 43min
EP 7 - Getting Started with Hadoop Migration and Modernization
Most companies use Hadoop for big data analytical workloads. The problem is, on-premises Hadoop deployments have failed to deliver business value after it is implemented. Over time, the high cost of operations and poor performance places a limitation on an organization’s ability to be agile. As a result, data platform teams are looking to modernize their Hadoop workloads to the data lakehouse.
In this video, learn about:
Use cases for modernizing Hadoop workloads
How the data lakehouse solves the inefficiencies of on-premises Hadoop
Success stories from organizations that have modernized Hadoop with the data lakehouse on Dremio

Feb 15, 2023 • 44min
EP6 - Total Economic Impact of Data Lakehouse
As enterprise data platforms look to operate at a more efficient level, they face the pressure to pivot their data management strategies. The increasing volume of data, demand for self-service analytics that meets compliance requirements, and complexity of data distribution channels are all factors to consider when making a business case. In this video, we will cover the three-year Total Economic Impact™ of the data lakehouse and quantifiable benefits to productivity across all teams. You will learn about: - Key challenges organizations face with explosive data growth and data silos - Increasing team productivity and focusing more on high-value projects - Reducing data storage costs and retiring complicated ETL processes
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...

Feb 8, 2023 • 51min
EP5 - Apache Iceberg Office Hours - Apache Iceberg Partitioning Explanation
Join the Dremio developer advocacy and engineering teams for an installment of Apache Iceberg Office Hours. In this video, we’ll have a brief Iceberg presentation on Hidden Partitioning and Partitioning transforms in Iceberg and then lots of dedicated time for Q&A on the presented topic or any other questions or guidance you’re looking for help on in learning about Apache Iceberg or architecting your data lakehouse around Apache Iceberg. Examples of questions you can come to ask: How can I optimize my Iceberg tables for my different use cases? What tools will best handle my ETL job to write to Iceberg? How can I control access to my Iceberg tables? How can I convert data from X into an Iceberg table? How can I get started with Iceberg in Databricks? #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #tableau #bestpractices #dashboards #partitionevolution #metadata #icebergpartitioning #partitiontransforms #officehours #iceberg

Feb 1, 2023 • 50min
EP4 - Best Practices for Optimizing Tableau Dashboards with Dremio
Tableau is a visual analytics platform that helps more people in organizations see and understand their data. Dremio helps Tableau users accelerate access to data, including cloud data lakes, and it can dramatically improve query performance, delivering analytics for every data consumer at interactive speed. In this video, we'll cover: - how the Dremio open data lakehouse connects Tableau users directly to data lake storage and other data repositories,
- how reflections accelerate query performance for ad hoc analysis and interactive dashboards, and
- how the Dremio semantic layer extends self
-service capabilities beyond the visualization layer, so anyone can join and query data easily.
VIDEO ON YOUTUBE: https://www.youtube.com/watch?v=8fzYLgKHIj0
#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #shorts #dremio #tableau #bestpractices #dashboards #optimizing #selfservice

Jan 25, 2023 • 1h 1min
EP 3 - Migrating from Delta Lake to Iceberg
Iceberg has been gaining wide adoption in the industry as the defacto open standard for data lakehouse table formats. Join us as we help you learn the options and strategies you can employ when migrating tables from Delta Lake to Apache Iceberg. We’ll cover:
Why migrate to Apache Iceberg
How to do an In-place migration and avoid rewriting files
How to do a shadow migration
Best practices
PRESENTATION ON YOUTUBE: https://youtu.be/11p3AaPduos
Apache Iceberg FAQ: https://www.dremio.com/blog/apache-iceberg-faq/
Apache Iceberg 101: https://www.dremio.com/subsurface/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/


