132: Data Quality and Data Contracts with Chad Sanderson of Data Quality Camp
Mar 29, 2023
auto_awesome
Data quality and data contracts are discussed by Chad Sanderson, an expert in the field. Topics covered include the breakdown of data quality, the concept of data contracts and their value, the tools needed for effective data contracts, and the importance of community in data quality.
Data contracts ensure data quality and integrity, going beyond traditional APIs to consider semantic and logical layers.
Implementing data contracts involves communication, collaboration, and the use of tools like schema registries and monitoring tools.
Education and community are crucial for promoting the understanding, adoption, and improvement of data contracts and data quality practices.
Deep dives
Importance of Data Contracts and Data Quality
Data contracts are agreements between producers and consumers that ensure data quality at scale. They serve as a form of a data API, going beyond traditional APIs by considering not just schema but also the integrity of the data itself. Data contracts help ensure that data products work as intended and meet requirements. By enforcing programmatic mechanisms and checks at various stages like schema registry, serialization framework, staging tables, and CI/CD processes, data contracts ensure data consistency, trustworthiness, and adherence to semantic standards. Building a strong foundation of trustworthy data pipelines, ownership, and schema evolution is crucial in implementing data contracts.
Challenges and Solutions in Data Contracts Implementation
Implementing data contracts involves addressing challenges such as semantic differences, conflicting interpretations, and evolving logic. Communication and collaboration between producers and consumers is essential. Technical communication can be facilitated through pull requests, where the data pipeline, lineage, constraints, and requirements are discussed. However, bridging the gap between non-technical consumers and technical consumers requires additional layers of abstraction and communication. Tools like schema registries, data profiles, monitoring tools, and CI/CD processes can aid in enforcing data contracts and ensuring data quality. Striving for a centralized agreement on the most important aspects while allowing decentralization for specialized use cases is key to successful implementation.
The Role of Education and Community in Data Contracts
Education and community play crucial roles in promoting the understanding, adoption, and evolution of data contracts. Education helps to create awareness of the importance of data quality and the benefits of implementing data contracts. Building a community allows data practitioners to share knowledge, experiences, and best practices, facilitating learning from each other. Communities provide a platform for discussions, asking questions, addressing challenges, and finding solutions. By being part of a community, data engineers and platform engineers can gain insights, tools, and perspectives to navigate conversations with producers and consumers, propose changes, and effectively address data quality issues. The community acts as a vessel for change, supporting the continuous improvement of data contracts and data quality practices.
Importance of Data Contracts in Data Handoffs
Data contracts need to exist anytime there is a handoff of data from one team to another, such as from a Postgres database to a data lake or from a data lake to a data warehouse. These contracts serve as the API for data and ensure that the data is transformed and consumed correctly. The goal is to shift ownership of contracts to the left, making enforcement embedded in the developer workflow. The mechanism of enforcement depends on the stage in the pipeline, such as using CDC and event bus for detection and alerting. The awareness provided by data contracts brings value to producers by understanding how their data is being used and avoiding unintended consequences. For consumers, data contracts ensure higher quality data for critical use cases.
Starting the Conversation and Implementing Data Contracts
The driving force behind implementing data contracts is often the data engineering or data platform team, as they experience the pain points in data communication. Implementation should start with smaller, incremental changes that demonstrate business value. It's not necessary to implement data contracts everywhere, but focus on specific areas where ROI is justified, such as in analytics and high-value data products. Building awareness infrastructure and fostering good relationships between producers and consumers can provide immediate benefits. Smaller companies can start by being involved in the change conversation and gradually introduce contracts where there is measurable value and shared ownership.
What are data contracts and how do they work? (17:41)
Implicit contracts at companies (24:01)
Where do data contracts fit in data infrastructure? (28:14)
The value of data contracts to the producer and consumer (31:18)
Tools needed in effective data contracts (46:13)
The importance of community in data quality (50:53)
Getting connected to Data Quality Camp (1:00:55)
Final thoughts and takeaways (1:01:53)
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode