#147 - Data Warehouses and Semantics Deep Dive, SDF, and more w/ Lukas Schulte (SDF)
Oct 9, 2023
auto_awesome
Lukas Schulte, an expert in data warehousing and semantics, discusses the importance of semantics in data warehouses, the challenges of defining common terms in business and data analysis, and introduces SDF, a semantic data product. They also explore the significance of context in data analysis and the availability of free versions of tools.
Semantics are important for a data warehouse to achieve consistent definitions and meanings for metrics like daily active users.
SDF provides a static analysis engine that attaches classifiers to columns, allowing for clear lineage and better control over data types and privacy rules.
Code checks in SDF help prevent currency mismatches and data leakage, ensuring consistency in business logic and data transformations.
Deep dives
SDF Labs and the Semantic Data Product
Lucas, CEO of SDF Labs, introduces their semantic data product, which aims to provide a better understanding of data warehouses from ingestion to consumption. He explains how their previous experience in sensor analytics and computer vision led them to address challenges in data retention, user data management, and understanding semantic meanings within organizations.
The Importance of Semantics and Business Logic
Lucas and Joe discuss the importance of semantics and business logic in data management. They highlight the challenges organizations face in achieving consistent definitions and meanings for metrics like daily active users. They also explore how companies currently tackle these challenges using human processes and bespoke SQL queries, and why there is a need for a more structured approach.
SDF's Solution: The Semantic Data Fabric
Lucas demonstrates how SDF works as a static analysis engine that provides end-to-end understanding of a data warehouse. He showcases the ability to attach classifiers to columns, such as currency types and personally identifiable information (PII), to create a semantic model. The classifiers propagate throughout the warehouse, allowing for clear lineage and better control over data types and privacy rules.
Code Checks and Error Detection
Lucas explains the usage of code checks in SDF, which are SQL queries that verify certain assertions in the information schema. He demonstrates how code checks can detect and prevent currency mismatches or data leakage in specific schemas, providing early feedback and ensuring consistency in business logic and data transformations.
Benefits and Future Applications
The podcast hosts discuss the benefits of SDF in improving data governance, data retention, and business process understanding. They also touch on the potential for SDF to be used in ETL processes, schema migrations, and CICD pipelines to facilitate the shift left movement in data engineering. They conclude by highlighting the need for more control plane technologies like SDF in the data management space.
Why are semantics important for a data warehouse? Lukas Schulte joins us to chat about why semantics are important, the heterogeneity of data systems, how semantics relate to SQL compilers, his project SDF, and much more. Please be aware that this discussion will get into the nitty-gritty and technical weeds of all things data.