Data Brew Season 1 Episode 1: From data warehousing to data lakes in 40 minutes
Oct 28, 2020
auto_awesome
Barry Devlin, Susan O'Connell, and Donald Farmer discuss the evolution from data warehousing to data lakes. They highlight the importance of data warehousing in the 90s, benefits of data lakes, challenges of working without management and governance, integration and collaboration challenges, resolving paradoxes for data lake houses, and the significance of information management and people aspects in dealing with data.
The transition from data warehousing to data lakes allows for storing unstructured data with flexibility, but it also introduces challenges with management and governance.
A successful data lakehouse requires finding the right balance between governance and flexibility, focusing on information management, governance, semantics, and organizational aspects.
Deep dives
Data Warehousing and Data Lakes
The podcast episode discusses the transition from data warehousing to data lakes. They explore how data warehousing became important in the 90s due to the need for consistent and reconciled data to make better business decisions. However, data warehousing had challenges with agility and timeliness. This led to the emergence of data lakes, which allowed for storing unstructured data with flexibility. The podcast guests highlight the advantages of data lakes, such as storing data in its native format, enabling multiple data use cases, and handling large-scale analytics. They also discuss the drawbacks, including the lack of management and governance, leading to the emergence of data swamps or data salads. The idea of a data lakehouse is also mentioned, combining the structured reporting capabilities of data warehousing with the flexibility of unstructured data in data lakes. However, the guests note the challenges of reconciling these two concepts and the need to find the right balance between governance and flexibility.
Challenges and Paradoxes of Data Lakehouses
The discussion highlights the challenges and paradoxes involved in combining structured reporting with unstructured data. It addresses the need to govern and manage data, while also maintaining flexibility and agility. The guests emphasize the importance of properly defining and understanding the roles of data lakes and data warehouses. They caution against over-governing data lakes, which could hinder their flexibility, and recommend finding the right balance between governance and agility. Ultimately, the success of a data lakehouse relies on a shift in mindset from being gatekeepers to shopkeepers of data, focusing on information management, governance, semantics, and organizational aspects rather than getting caught up in specific technologies or platforms.
The Benefits of Data Lakes and Data Warehouses
The podcast guests highlight the advantages of data lakes, such as storing data in a native or near-native format, providing flexibility for various use cases, and accommodating messy, complex data. They discuss the ability to store and access data without extensive transformation or modeling. On the other hand, they acknowledge the benefits of data warehouses, including structured reporting, organized data, and enterprise-level governance and performance. While data lakes excel at handling unstructured and rapid data ingestion, data warehouses offer structured and governed reporting capabilities. They emphasize the need for both concepts, and the potential for a hybrid approach that integrates the strengths of both data lakes and data warehouses.
The Importance of People and Mindset
The podcast episode emphasizes the importance of people and mindset in the context of data lakes and data warehouses. The guests discuss the need for a collaborative and strategic approach involving both business and IT teams. They stress that technical solutions alone are not sufficient and advocate for focusing on information management, governance, and aligning with organizational goals. They suggest that considering the personal aspects, such as user adoption, trust, and change management, is just as important as addressing technical challenges. The guests also recommend shifting from being gatekeepers to shopkeepers of data, providing access and governance in a balanced manner to support diverse business needs.
In our inaugural episode, we’d like to welcome data warehouse luminaries Barry Devlin, Susan O’Connell, and Donald Farmer to discuss the evolution of data warehouses, data lakes, and lakehouses.