IRM UK Conference, March 11-14: https://irmuk.co.uk/dgmdm-2024-2-2/ use code DM10 for a 10% off discount!
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.
Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.
Ole's LinkedIn: https://www.linkedin.com/in/ole-olesen-bagneux-2b73449a/
Piethein's LinkedIn: https://www.linkedin.com/in/pietheinstrengholt/
Samia's LinkedIn: https://www.linkedin.com/in/samia-rahman-b7b65216/
Liz's LinkedIn: https://www.linkedin.com/in/lizhendersondata/
Ole's book The Enterprise Data Catalog: https://www.oreilly.com/library/view/the-enterprise-data/9781492098706/
Piethein's book Data Management at Scale (2nd Edition): https://www.oreilly.com/library/view/data-management-at/9781098138851/
Liz's blog: https://lizhendersondata.wordpress.com/
In this episode, guest host Ole Olesen-Bagneux, Chief Evangelist at Zeenea (guest of episode #82) facilitated a discussion with Piethein Strengholt, CDO at Microsoft Netherlands (guest of episode #20), Liz Henderson AKA The Data Queen, a board advisor, non-executive director, and mentor in digital and data at Capgemini (guest of episode #106), and Samia Rahman, Director of Enterprise Data Strategy, Architecture, and Governance at SeaGen/Pfizer (guest of episode #67). As per usual, all guests were only reflecting their own views.
The topic for this panel was modernizing master data management (MDM) and applying that to data mesh. It's a very challenging topic to cover because even people's general definition of MDM can be pretty different and there is a question between simply mastering data versus trying to globally compared to locally manage master data. It's a very tricky topic in data mesh. I sometimes use the term 'mastered data' because I think it is far more applicable in that situation than 'master data' - audio goes through a mastering process, so must the master data to actually reach a certain quality level. 'Master data' is more the core linking data. But even that is still just one person's definition.
Scott note: As per usual, I share my takeaways rather than trying to reflect the nuance of the panelists' views individually. Also, there was a bit of a misstep around intros if that gets a bit lost in there.
Scott's Top Takeaways:
- The historical impression of MDM - striving for that single golden record - needs to change. Trying to head down that path in a federated/decentralized approach is even more difficult that in a centralized world. And the benefits just keep proving out to not be worth costs.
- You need to consider your master data management strategy to do data mesh well. Do you want official sources of truth relative to specific questions? Who owns data quality for core linking data? Etc. You don't have to get it perfect at the start but if people don't trust the data on the mesh, your mesh implementation probably fails. Mastering data can improve the quality - especially the provability of that quality - and thus trust. And without strong linking data between data products, do you just have high quality data silos?
- Relatedly, there is a tipping point in an organization related to size and complexity where you need to consider data mesh. There is a tipping point in a mesh implementation where you need to really start to push MDM. It doesn't have to be on day one but you should plan from the start around MDM.
- MDM - at least done well - is not about getting to 'perfect' data, it's about understanding the needs of your organization and helping people get to the right quality level and providing core data when it's needed. Not all data needs mastering. Not all data is master data.
- There's a difference between 'single source of truth' and 'most trusted source of truth'. It's absolutely okay to designate 'sources of truth' for specific questions. Other data sources may provide different perspectives on the same topic - e.g. customer is different in sales, marketing, and finance - but there needs to be one right, repeatable answer for things like regulatory reporting or financial statements ๐
- One reason MDM is such a risk to data mesh is that, to be effective, some part of master data essentially has to be centrally managed. You need ways for domains to adhere to central standards, guidelines, policies, etc. Otherwise you risk silos. BUT centrally managed often leads to inflexibility. It can be tough to thread this needle. Look to provide the value add from MDM but limit the overhead and rigidity. How that will apply to your organization will be quite specific but go talk to others in your space to very specifically understand their approaches. No one-size-fits-all, no copy/paste. This is going to be very hard but don't skip it.
- If you are going to do MDM in your data mesh implementation, much like with anything in data mesh: test, learn, and iterate. Don't do a huge upfront implementation or it will cost far too much and limit your agility and flexibility far too much. Plan some aspects of your MDM implementation out ahead of time but trying to do everything at the start is a massive anti-pattern.
- There is a massive push and pull in data mesh - one where you have to find the right balance specific to your organization - between master data and maintaining the domain-level meaning and understanding of data in domain-specific data products. This is where the enterprise data warehouse often goes wrong: focusing on fitting the data together (master data) at the expense of its actual meaning and uniqueness. It's a balance. Scott note: I often call this local versus global maximization, this time related to business context :)
Other Important Takeaways (many touch on similar points from different aspects):
- Liz asked an interesting question: if we are creating true high-quality data products, have we by definition mastered the data inside? Is that good enough? The answer to the first is maybe? The answer to the second seems to be no, we need to look at how data in a data product fits in the broader scope of the organization's data needs or we're just creating high-quality data silos as data products. That is that master data versus mastered data.
- Do we only need to consider data that is broadly reusable across the organization for mastering? It can be quite complex - and political too ๐
- figuring out what data is deserving of being mastered and what data should be master data.
- Historical MDM approaches have been extremely costly often without the return on investment to justify them. Look to consider a point Khanh Chau made way back in episode #44: if everyone is doing similar cleaning work instead of that being in the data product itself, your total cost of ownership skyrockets. In MDM, when there isn't a clear owner for data because it's so broadly used for linking, that may fall back to the data team. NOT IDEAL but such is life/the real world.
- Be super clear internally as to your definitions around what MDM means and its application. It will look different in any organization. Much like the fun "what is a data product?" conversation.
- Dive into the difference between reference data - e.g. standardized country codes - and master data. Data that can be combined but isn't of a high enough quality still needs to be mastered. It's not only about creating interoperable data but data you would actually want to use upon combining it.
- As Piethein mentioned, MDM should not be managed at the domain levels. That creates a massive mess of contention.
- MDM is especially key in industries with lots of externally purchased data - e.g. financial services or life sciences - because many domains all leverage the same data. It saves time and money to have that managed once instead of 10s of times ๐
- Should data consumers push insights and data improvements back to their data producers? How do business logic and transformations of the data play into MDM? Does it matter what we call it as long as we allow value-add work to easily flow and become scalable?
- Because of the data products approach with very clear ownership, data mesh may actually make MDM easier than the traditional centralized approach. There is - at least there is supposed to be - clear lineage, documentation, other metadata, ownership, etc. so we can all understand more easily what data we should use and for what purposes. Or at least ask the owners for more information to get a better understanding of that.
- MDM without strong metadata ownership and management is just like data mesh without them: a disaster waiting to happen.
- MDM and DDD: there is a complex interplay between how MDM and Domain Driven Design work together. Every domain has its own unique 'language', even a somewhat unique language for communicating externally, but you do need some more broad language of the organization. Getting that crisp so your data products fit a broader taxonomy or ontology is going to be a MASSIVE challenge.
- If you don't have strong documentation and metadata management, you are far less likely to see a valuable MDM implementation, data mesh or otherwise. People will discover data and use it whether they really understand it or not ๐
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf