Gnarly Data Waves by Dremio

Dremio (The Open Data Lakehouse Platform)

Gnarly Data Waves is a weekly show about the world of Data Analytics and Data Architecture. Learn about the technologies giving the company access to cutting-edge insights. If you work datasets, data warehouses, data lakes or data lakehouses, this show it for you!

Join us for our live recordings to participate in the Q&A:
dremio.com/events

Subscribe to the Dremio youtube channel on:
youtube.com/dremio

Take the Dremio Platform for a free test-drive:
https://www.dremio.com/test-drive/

Episodes

Mentioned books

Apr 12, 2023 • 1h 4min

EP12 - How to Modernize Hive to the Data Lakehouse with Dremio and Apache Iceberg

Learn how to modernize Hive to the Data Lakehouse using Dremio and Apache Iceberg. Explore in-place migration, shadow migration, and moving tables between catalogs. Discover the benefits of Apache Iceberg, different catalog options, and optimistic concurrency. Dive into automating compaction and optimizing tables, pricing and compatibility with Dremio, Kafka, Nessie, and Iceberg. Also, get insights on using AWS Glue catalog with Iceberg and the upcoming Nessie connector for Dreamio Arctic catalog.

Apr 5, 2023 • 36min

EP11 - Apache Iceberg Office Hours - Apache Iceberg 1.2.0 has been released

Listen to Dremio's developer advocacy and engineering teams for an installment of Apache Iceberg Office Hours. During this time we’ll have a brief Iceberg presentation on Hidden Partitioning and Partitioning transforms in Iceberg and then lots of dedicated time for Q&A on the presented topic or any other questions or guidance you’re looking for help on in learning about Apache Iceberg or architecting your data lakehouse around Apache Iceberg. Questions being asked: How can I optimize my Iceberg tables for my different use cases? What tools will best handle my ETL job to write to Iceberg? How can I control access to my Iceberg tables? How can I convert data from X into an Iceberg table? How can I get started with Iceberg in Databricks? #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #tableau #bestpractices #dashboards #partitionevolution #metadata #icebergpartitioning #partitiontransforms #officehours

Mar 31, 2023 • 42min

EP10 - Optimizing Data Files in Apache Iceberg Performance Strategies

Querying 100s of petabytes of data demands optimized query speed specifically when data accumulates over time. We have to ensure that the queries remain efficient because over time you may end up with a lot of small files and your data might not be optimally organized. In this video, Dipankar will cover: Apache Iceberg table format Problems in the data lake: small files, unorganized files Techniques such as: partitioning, compaction, metrics filtering Overlapping metrics problem Solving it using sorting, Z-order clustering See all upcoming episodes: https://www.dremio.com/gnarly-data-wa... Connect with us! Twitter: https://bit.ly/30pcpE1 LinkedIn: https://bit.ly/2PoqsDq Facebook: https://bit.ly/2BV881V Community Forum: https://bit.ly/2ELXT0W Github: https://bit.ly/3go4dcM Blog: https://bit.ly/2DgyR9B Questions?: https://bit.ly/30oi8tX Website: https://bit.ly/2XmtEnN #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #zorder #clustering #metrics #filtering #partitioning #sorting #tableformat

Mar 22, 2023 • 42min

EP9 - Build your open data lakehouse Iceberg with Fivetran and Dremio

The data lakehouse is quickly emerging as the ideal data architecture because it combines the flexibility and scalability of data lakes with the data management, data governance, and data analytics capabilities of data warehouses. Table formats bring many of the “house” features to the data lakehouse. Apache Iceberg is a truly open table format that is built for easy management and high performance analytics on the largest data volumes in the world. In this video, we’ll discuss: - Why open table formats are fundamental to building a data lakehouse - How Fivetran automates data movement and helps organizations easily move data from various sources to their Amazon S3 data lake in Apache Iceberg tables. - How Dremio & Fivetran simplify your data lakehouse architecture while providing high performance and ease of use. See all upcoming episodes: https://www.dremio.com/gnarly-data-wa... Connect with us! Twitter: https://bit.ly/30pcpE1 LinkedIn: https://bit.ly/2PoqsDq Facebook: https://bit.ly/2BV881V Community Forum: https://bit.ly/2ELXT0W Github: https://bit.ly/3go4dcM Blog: https://bit.ly/2DgyR9B Questions?: https://bit.ly/30oi8tX Website: https://bit.ly/2XmtEnN#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #fivetran #automates #datamovement #amazons3

Mar 15, 2023 • 49min

EP8 - Managing your data as code with Dremio Arctic

As data lakes become the primary destination for growing volumes of customer and operational data, data teams need tools and processes that ensure data quality and consistency across data consumers and use cases. Join Dremio’s Jeremiah Morrow and Alex Merced as they discuss the emergence of data as code for data management, its benefits for data teams, and how Dremio customers are using it to deliver access to a consistent and accurate view of data in their data lakes. In this video on Gnarly Data Waves - Managing your data as code with Dremio Arctic, you will learn about: - Why data as code is necessary for ensuring consistency and data quality for large data lakes. - How Dremio Arctic uses Git-like concepts such as branches, tags, and commits to make data management easy. - Some high value use cases for data as code. See all upcoming episodes: https://www.dremio.com/gnarly-data-waves/?utm_medium=social-free&utm_source=youtube&utm_term=GDWEP8&utm_content=gdw-OD&utm_campaign=gdw-EP8 Connect with us! Twitter: https://bit.ly/30pcpE1 LinkedIn: https://bit.ly/2PoqsDq Facebook: https://bit.ly/2BV881V Community Forum: https://bit.ly/2ELXT0W Github: https://bit.ly/3go4dcM Blog: https://bit.ly/2DgyR9B Questions?: https://bit.ly/30oi8tX Website: https://bit.ly/2XmtEnN #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #artic #dataascode #branches #tags

Feb 22, 2023 • 43min

EP 7 - Getting Started with Hadoop Migration and Modernization

Most companies use Hadoop for big data analytical workloads. The problem is, on-premises Hadoop deployments have failed to deliver business value after it is implemented. Over time, the high cost of operations and poor performance places a limitation on an organization’s ability to be agile. As a result, data platform teams are looking to modernize their Hadoop workloads to the data lakehouse. In this video, learn about: Use cases for modernizing Hadoop workloads How the data lakehouse solves the inefficiencies of on-premises Hadoop Success stories from organizations that have modernized Hadoop with the data lakehouse on Dremio

Feb 15, 2023 • 44min

EP6 - Total Economic Impact of Data Lakehouse

As enterprise data platforms look to operate at a more efficient level, they face the pressure to pivot their data management strategies. The increasing volume of data, demand for self-service analytics that meets compliance requirements, and complexity of data distribution channels are all factors to consider when making a business case. In this video, we will cover the three-year Total Economic Impact™ of the data lakehouse and quantifiable benefits to productivity across all teams. You will learn about: - Key challenges organizations face with explosive data growth and data silos - Increasing team productivity and focusing more on high-value projects - Reducing data storage costs and retiring complicated ETL processes See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...

Feb 8, 2023 • 51min

EP5 - Apache Iceberg Office Hours - Apache Iceberg Partitioning Explanation

Join the Dremio developer advocacy and engineering teams for an installment of Apache Iceberg Office Hours. In this video, we’ll have a brief Iceberg presentation on Hidden Partitioning and Partitioning transforms in Iceberg and then lots of dedicated time for Q&A on the presented topic or any other questions or guidance you’re looking for help on in learning about Apache Iceberg or architecting your data lakehouse around Apache Iceberg. Examples of questions you can come to ask: How can I optimize my Iceberg tables for my different use cases? What tools will best handle my ETL job to write to Iceberg? How can I control access to my Iceberg tables? How can I convert data from X into an Iceberg table? How can I get started with Iceberg in Databricks? #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #tableau #bestpractices #dashboards #partitionevolution #metadata #icebergpartitioning #partitiontransforms #officehours #iceberg

Feb 1, 2023 • 50min

EP4 - Best Practices for Optimizing Tableau Dashboards with Dremio

Tableau is a visual analytics platform that helps more people in organizations see and understand their data. Dremio helps Tableau users accelerate access to data, including cloud data lakes, and it can dramatically improve query performance, delivering analytics for every data consumer at interactive speed. In this video, we'll cover: - how the Dremio open data lakehouse connects Tableau users directly to data lake storage and other data repositories, - how reflections accelerate query performance for ad hoc analysis and interactive dashboards, and - how the Dremio semantic layer extends self -service capabilities beyond the visualization layer, so anyone can join and query data easily. VIDEO ON YOUTUBE: https://www.youtube.com/watch?v=8fzYLgKHIj0 #datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #shorts #dremio #tableau #bestpractices #dashboards #optimizing #selfservice

Jan 25, 2023 • 1h 1min

EP 3 - Migrating from Delta Lake to Iceberg

Iceberg has been gaining wide adoption in the industry as the defacto open standard for data lakehouse table formats. Join us as we help you learn the options and strategies you can employ when migrating tables from Delta Lake to Apache Iceberg. We’ll cover: Why migrate to Apache Iceberg How to do an In-place migration and avoid rewriting files How to do a shadow migration Best practices PRESENTATION ON YOUTUBE: https://youtu.be/11p3AaPduos Apache Iceberg FAQ: https://www.dremio.com/blog/apache-iceberg-faq/ Apache Iceberg 101: https://www.dremio.com/subsurface/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner