825: Data Contracts: The Key to Data Quality, with Chad Sanderson
Oct 8, 2024
auto_awesome
Chad Sanderson, CEO of Gable.ai and an expert in data quality and governance, shares insights on the transformative power of data contracts in modern data management. He explains how these contracts clarify expectations for data quality and promote better alignment between data producers and consumers. The conversation dives into 'shifting left' practices that tackle problems early, address concerns about data debt, and the crucial role of human oversight. Chad also highlights storytelling as a tool for data teams to enhance communication and effectiveness.
Data contracts establish clear expectations for data quality and facilitate accountability between producers and consumers, enhancing overall data reliability.
Shifting data quality checks closer to production enables proactive issue detection and fosters collaboration among data teams in governance practices.
Data debt results from unmanaged changes and poor communication, leading to complex architectures that necessitate open dialogue for effective management.
Deep dives
Understanding Data Contracts
Data contracts are formal agreements between data producers and consumers, addressing persistent data quality and governance challenges. They define expectations related to data schema, quality, and semantics, similar to service contracts used in software engineering. The podcast emphasizes how data contracts facilitate clearer communication and accountability between data teams, thus improving data reliability. This structured approach aims to minimize the disruption caused by changes in data, such as modifications in formats or omitted fields.
Shifting Left in Data Management
The concept of shifting left in data management focuses on moving quality checks and governance closer to the data production phase. Historically, data quality measures were applied reactively, often leading to significant issues later in the data pipeline. By integrating data governance practices into the development workflow of data producers, organizations can prevent potential problems before they escalate. This proactive strategy enables collaboration and ensures that all teams involved understand the implications of their changes on the downstream data consumers.
Impact of Data Debt
Data debt accumulates when there are changes in data that are not properly managed or communicated to downstream consumers, resulting in complex data architectures. The podcast outlines how this can lead to what is termed 'spaghetti data architecture,' where data becomes increasingly convoluted and difficult to navigate. Producers often add filters and adjustments in response to data changes, creating additional complexity without addressing the root issues. Managing data debt effectively requires an open dialogue between teams to maintain clarity and intent in data representation.
Trust and Understanding in Data Relationships
Trust between data producers and consumers is essential for effective data utilization and governance. It is built on two main components: understanding the context of the data and consistently meeting expectations. Producers need to be aware of how their data is being used and the implications of any changes they make, fostering stronger collaboration with consumers. Clear communication about data meanings and potential impacts enhances trust, allowing teams to work more effectively together in achieving their data goals.
Cultural Shifts in Data Management
The podcast suggests that many challenges faced in data quality are rooted in cultural issues rather than technology shortcomings. As organizations shift toward utilizing a more federated model of data governance, expectations and responsibilities for data quality must be distributed across all teams. This cultural change is necessary to ensure that all data stakeholders take ownership of the quality and governance of their data. Enhancing communication and collaboration through storytelling techniques can help bridge gaps, ultimately leading to improved data management outcomes.
Data contracts are redefining data quality and governance, and Chad Sanderson, CEO of Gable.ai, joins host Jon Krohn to explain how they can transform your data strategy. He breaks down what data contracts are, how they shift data quality checks closer to production, and why they’re essential for reducing data debt. Chad also highlights how better alignment between data producers and consumers can elevate data reliability and tackle change-management challenges in modern organizations.
This episode is brought to you by epic LinkedIn Learning instructor Keith McCormick, and by Gurobi, the Decision Intelligence Leader. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this episode you will learn:
What data contracts are and how they define expectations for data quality [03:16]
What data contracts look like [09:09]
The common misconceptions about data quality when implementing AI [12:55]
Chad’s Chief Operator role at Data Quality Camp [19:46]
How “shifting left” improves data reliability by addressing issues early [24:17]
Why data professionals still struggle with data quality [30:31]
How data debt forms and why it leads to complex, inefficient architectures [35:53]
How will the role of human oversight evolve in ensuring data quality? [47:12]