#78 - Data Warehousing in 2022, Textual ETL, and More w/ Bill Inmon
Apr 18, 2022
auto_awesome
Bill Inmon, the father of the data warehouse, discusses textual ETL and data warehousing in 2022. He explores the challenges in analyzing text data and the solution of textual ETL technology. They also delve into forest rim technology, IBM's Watson text analytics, and the misconception of data mesh replacing data warehouses.
Organizations often overlook the business value of text data in data warehousing, which presents challenges when analyzing large amounts of patient data simultaneously.
Textual ETL technology allows organizations to efficiently convert text data into structured data, enabling analysis of significant numbers of records and identifying patterns related to diseases or conditions.
Deep dives
The Value of Text in Data Warehousing
The podcast episode explores the importance of text in data warehousing. The speaker highlights the business value of text data that is often ignored by organizations. The example of Electronic Health Records (EHR) demonstrates how the majority of the record is in the form of text, which presents challenges when analyzing large amounts of patient data simultaneously. The speaker emphasizes that while current database management systems are designed for repetitive data, they struggle with non-repetitive text data. Additionally, the speaker mentions the complexity of handling multiple languages within text data.
Transforming Text into a Relational Database
The podcast discusses the concept of textual ETL, a technology that can convert text into a standard relational database management system. By utilizing textual ETL, organizations can transform text data to allow for more efficient analysis of large datasets. The advantage of this approach is the ability to look at a significant number of records simultaneously, such as analyzing medical records to identify patterns related to certain diseases or conditions. While there may be some drawbacks, such as the loss of stop words and incomplete accuracy, the benefits of transforming text into structured data greatly outweigh the limitations.
Challenges and Misconceptions in Text Data Handling
The podcast episode highlights the challenges organizations face when handling text data. The speaker explains that standard database management systems are not designed to handle the complex nature of text, leading to difficulties in managing inconsistency and variations in language. The need to understand context in addition to text further complicates text data analysis. The speaker also notes that misconceptions and lack of awareness contribute to organizations ignoring the potential of text data. The podcast emphasizes the importance of recognizing the value of text data and investing in appropriate technologies to overcome these challenges.
Data Warehousing vs. Snowflake and the Data Mesh Movement
In an article by the speaker, the distinctions between data warehousing and technologies like Snowflake are explored. The article clarifies that Snowflake is a general-purpose database management system rather than a dedicated data warehouse solution. The author urges caution in mislabeling data projects built on Snowflake as data warehouses, as it can lead to false expectations and blame against the data warehouse concept itself. The speaker also highlights the importance of educating individuals on the true essence and purpose of data warehousing to avoid confusion caused by vendors co-opting the term.
Bill Inmon (the father of the data warehouse, and an early pioneer of the data industry) joins the show to chat about textual ETL, data warehousing in 2022, and whatever else he wants to talk about.
Today is Bill's show, and we're just excited you're a part of it.
Streamed live on YouTube and LinkedIn.
#datawarehouse #dataengineering #textualetl
---------------------------------
TERNARY DATA
We are Matt and Joe, and we’re "recovering data scientists". Together, we run a data architecture company called Ternary Data. Ternary Data is not your typical data consultancy. Get no-nonsense, no BS data engineering strategy, coaching, and advice. Trusted by great companies, both huge and small.
Subscribe to our newsletter, or check out our services at Ternary Data Site - https://ternarydata.com