AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Role of a Data Catalog in Automation and Data Product Governance
There is value in having human expertise in data analysis, but technology is improving and automation can also be beneficial. Libraries used to rely on humans to catalog books, but now search engines optimize algorithms using machine learning. Humans will still be involved, but more behind the scenes. Data catalog can help determine what should be automated. While some tasks, like dealing with misspellings, can be automated, others may be better left as one-off projects. It's important to balance automation with human understanding and not rely solely on technology.
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here
Links:
Ole's Book (O'Reilly Early Release): https://www.oreilly.com/library/view/the-enterprise-data/9781492098706/
Ole's LinkedIn: https://www.linkedin.com/in/ole-olesen-bagneux-2b73449a/
Ole's Other Recommended Reading:
Zhamak Dehghani's Data Mesh book: https://www.oreilly.com/library/view/data-mesh/9781492092384/
Piethein Strengholt's Data Management at Scale book: https://www.oreilly.com/library/view/data-management-at/9781492054771/
The Elements of Knowledge Organization: https://link.springer.com/book/10.1007/978-3-319-09357-4
In this episode, Scott interviewed Ole Olesen-Bagneux, an Enterprise Architect who focuses on data at GN and the author of an upcoming book on data catalogs with O'Reilly. To be clear, Ole was only representing himself and not GN.
The two main topics, which are somewhat intertwined, were: 1) how can we better understand and handle the concept of a domain when discussing data; and 2) how can we build systems that better enable us to search "for" data, not just search "in" data that we know exists?
Some practical advice and general conclusions from Ole:
Ole started the conversation sharing his view that Domain Driven Design (DDD) has some shortcomings when used especially for data domain mapping and in general in data. In his view, DDD is overly tied to software engineering so there is too much of a technical bent to understanding and even mapping out domains. He recommends taking domain analysis and domain theory learnings from the Library and Information Sciences discipline and using that to start your domain mapping and then look to bring in DDD after you get a good initial understanding of your domains. DDD and domain analysis can work together harmoniously, they don't really contradict, but domain analysis focuses on the knowledge first instead of the technical first.
While Ole was inspired by Zhamak's book as well as the book by Piethein Strengholt, his believes domain analysis lowers the significant friction and often frustration organizations feel when trying to start doing DDD for data. Domain analysis digs much more into what the domain does and why instead of how the domain communicates via software. He believes that data mesh should focus more on the information sharing and less on the software and that DDD will overcomplicate your domain mapping.
For Ole, DDD is overly concerned with modeling domains into software but you need to get to a deeper understanding of your domains and organization first before focusing in on your model. It may be that you truly can't fully communicate your domain's context in a data model either and it's good to know that upfront and take steps to communicate in other ways, such as enhanced documentation.
Ole believes we focus too much on the data model and that often sends people down the path of overly technically-focused solutions. Other guests have mentioned that documentation around data sets and data products is often much more focused on the technical aspects, not actually describing the information represented by the data. How do we store our data so we can make it usable for humans, not just software? Make it searchable, findable, understandable, etc. At both the micro level - a dataset or data product - and the overall macro data mesh level.
Building a semantic model is at least as important as a data model for Ole. We need to again focus on that searchability but what different search capabilities do we need? A simple search experience on a keyword or two, browsing what data is available, complicated queries with filters, etc. Can we enable querying by data lineage, by relationships with a knowledge graph, across a domain, etc.? It's a different way of approaching data that is not similar to a data model.
For a functional data mesh, Ole believes there needs to be a big focus on the metadata layer. You need to enable data consumers to find what data exists in their "knowledge universe"; focus on also serving the use case of searching "for" data, not just the typical searching "in" data for a specific answer.
Possible ways forward on semantic knowledge sharing should come from the Library and Information Sciences space in Ole's view. They've been doing this for centuries in one way or another. We need to start thinking in a metadata way to move forward. And we need the industry to help develop better metadata tooling and for data practitioners to focus on what metadata tooling they have, working to harmonize data across those tools. It doesn't have to all happen at once either, we can work from domain to domain to focus on that metadata harmonization.
Ole finished the conversation talking about the fine balance between leveraging tooling and trying to do everything with tooling. There will be important roles for humans in the middle of knowledge sharing - whether they will be more consumer facing like a data concierge or more behind the scenes, we shall see, but Ole bets it will be the latter.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode