Data contracts: What are they and why do they matter?
Nov 14, 2024
auto_awesome
In this discussion, Andrew Jones, an independent data consultant and creator of the data contract concept, teams up with Ryan Collingwood, a data strategist and author-in-progress on data contracts. They dive into the significance of data contracts, likening them to APIs for stable data transfer. The conversation highlights challenges in maintaining reliable data interfaces and the importance of collaboration between data producers and consumers. Additionally, they explore how tailored data contracts can enhance data quality and adapt to future organizational needs.
Data contracts act like APIs for data, ensuring stable and reliable data transfer, crucial for reliable data-driven applications.
They facilitate structured interactions between data producers and consumers, improving data management and enabling effective use for analytics and AI.
Deep dives
Understanding Data Contracts
Data contracts serve as a crucial interface for data similar to how APIs function in software development. They provide a layer of abstraction that enhances the reliability and accessibility of data by ensuring consistent data streams despite underlying database schema changes. By facilitating structured interaction between data producers and consumers, data contracts help clarify the context and meaning of the data, as well as its reliability and timeliness. This not only improves the management of data but also enables organizations to leverage data more effectively for applications like machine learning and artificial intelligence.
The Distinction from Traditional APIs
Unlike traditional APIs that typically handle small data interactions, data contracts are designed for managing larger datasets essential for purposes such as data analytics and model training. The communication format adopted for data contracts can vary, often taking the form of YAML or JSON, allowing organizations to tailor interfaces to their existing development environments. This flexibility ensures that the implementation is user-friendly, reducing friction and enhancing adoption by teams with varying levels of technical expertise. Furthermore, data contracts prioritize describing what data is available and how it can be utilized, rather than focusing solely on the technical specifications underlying the data.
Implementing Data Contracts in Organizations
When defining data contracts, organizations should consider the needs of their data contract owners and the technical expertise within their teams to ensure practical implementation. Utilizing familiar tools and formats can ease the transition, making it easier for data producers and consumers to create and maintain the contracts. Establishing a standardized format for data contracts can streamline integration with other tools, such as data catalogs and governance platforms, enhancing overall data quality. By fostering collaboration and understanding the processes that generate data, organizations can develop data contracts that serve as both technical specifications and practical guidelines.
The Future of Data Contracts
The growing importance of data quality drives the demand for data contracts, which are increasingly seen as essential for organizations aiming to harness data for strategic objectives. As data contracts gain traction, evolving standards in the field may lead to better integration and interoperability within data platforms. The future could see a move toward unified formats that simplify the conversion processes necessary for different tools and services within organizations. Ultimately, this evolution will empower teams to better manage data quality challenges while translating insights into actionable business strategies.
Data contracts are a bit like APIs for data — they make it possible to interface with data in a way that ensures the transfer of data from one place to another is stable and reliable. This is particularly important for building more reliable data-driven applications.
To discuss data contracts, host Lilly Ryan is joined on the Technology Podcast by Andrew Jones, the creator of the data contract concept (in 2021) and author of Driving Data Quality with Data Contracts (2023), and Thoughtworker Ryan Collingwood who is currently writing their own book on data contracts due to be published in 2025. Andrew and Ryan offer their perspectives on the topic, explaining the origins and motivation for the idea and outlining how they can be used in practice.