Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Nov 9, 2022 • 15min

#152 Zhamak's Corner 8 - Are We Using Tech Gold as a Paperweight?

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. What tech is already available that could be used for data mesh? There are so many amazing approaches and technologies in data but they've been used for the pipeline approach only. We need to think more like developers - not accepting the grunt work or death by a thousand cuts of data - and take a hard look at what we've done historically in data and what should be replaced. Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 7, 2022 • 1h 25min

#151 Driving Interoperability via Taxonomies and Tagging to Power Personalization - Interview w/ Jill Maffeo

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Jill's LinkedIn: https://www.linkedin.com/in/jillianmaffeo/Developing Interoperable Channel Domain Data (blog post): https://vista.io/blog/developing-interoperable-channel-domain-dataIn this episode, Scott interviewed Jill Maffeo, Senior Data Product Manager at Vista.Before jumping in, Jill gives a lot of very useful examples of outcomes they've been able to drive that could be abstracted to apply to your own organization's business challenges. Outcomes like better customer segmentation, faster time to launch new offerings, etc. If you are having difficulty with stakeholder buy-in, especially for someone in marketing, this episode could help you frame things in their language.Some key takeaways/thoughts from Jill's point of view:"When you're thinking about interoperability, it's just playing nice, right?" If you think of interoperability as a key part of your culture, it's easier to implement. Let people know why interoperability is good for them and the whole company.Taxonomies help drive interoperability because there is already an established language even if things don't fit perfectly. New concepts can emerge and your taxonomies should change. But it makes the interoperability discussions have at least a common starting point.Taxonomies are a living thing - make sure they aren't overly rigid and be prepared to continually evolve and improve them.Within your taxonomy structure, if there is a reason for things to be unique for a domain or use case, that is okay. Look for potential ways to also convert that data to best fit your taxonomy but you don't want to force a square peg through a round hole.Taxonomies really start to add a lot of value at scale. They are somewhat costly upfront with likely moderate return on investment early; but, if you do them right, will pay back a lot as you move forward. They make historical analysis - especially with interoperability - far easier because you've done the work ahead of time.Taxonomy, when done well, is about balancing standardization and flexibility. Much like most things in data mesh, it's about finding the right balance for your organization.Most customer journeys are cross domain. The more you make domain data interoperable, the more insight you have into the customer that can drive better business with your organization as a whole instead of only trying to locally optimize value by domain.Similarly, many executive questions and desired insights are not exclusive to a domain. Can we get to answering their questions much more quickly and completely? What value would that drive?To get funding for longer-term initiatives with a payoff down the road like taxonomies, look to directly attach your work to exec's strategic goals. Make it easy for them to see why this will be of value to invest in now rather than later.If a line of business isn't ready to engage on a key long-term strategic goal like very rich metadata, look for places to make progress that you control. That way, you can capture more value from data being generated now so when you look to reengage, there is more incentive.When re-engaging with a team that declined to work with you the first time, look to use as much empathy as possible. Yes, they said no but now is the time to say welcome to the party and invite them in. No 'I told you so', only 'great, how can we help?'It's easy to lose sight of differentiation in metrics. With a good taxonomy and metadata strategy, you can differentiate better between actions - e.g. not how many emails did you send but what type and why? 5 marketing emails in a week is probably bad but 4 order-related emails and 1 marketing email isn't.It's okay - and probably advisable - to have your taxonomy and tagging be able to serve multiple use cases. A postcard and a wedding invite look the same to manufacturing but are marketed very differently. A potential way to entice people to participate earlier in a data mesh journey: let them know they will have more say and influence on the general choices. However, you still want domains that are just starting to participate to know that you want their feedback and that their voice also matters.Think about building interoperability like building a structure. You need everyone coordinating - you want the electrician and the plumber to do their work before the drywaller. Redoing the work instead of setting your build schedule in place doesn't sound like a good idea.Focus on getting your standards in place for interoperability. Scott Note: no one is publishing their standards and this NEEDS to be done. People need to share their standards explicitly as everyone is reinventing these standards.When picking early use cases, potentially look for complementary use cases where having information needed for each individual use case can power even further use cases.Good - even if only rudimentary - taxonomy and tagging lets you easily see how things have changed over time without having to manually stitch data. It also lowers the barrier to external domains leveraging your data.A key benefit of taxonomies is the ability to tell richer stories. You have data across many different domains or business outcomes but you can see how they interplay.?Controversial?: Standard tagging and taxonomy can remove - and even prevent - tech debt, partially because it prevents some manual stitching of data.A good check for teams helping manage a data product portfolio is to take a few stakeholder questions and use what's available to try to answer them. How was the user experience? Is the information easy to understand? Etc.Jill started off by talking about her current role where she manages a "team of products that span the gamut between data ingestion, data curation, metadata curation and creation, and also [the] upper funnel portfolio around measurement and insight from a demand perspective." So, essentially she is managing a suite of data products around marketing that covers a wide variety of needs, some more technical and behind the scenes and many on the front-line of powering analytics.Taxonomy can be a significant help to driving interoperability. Taxonomy is ultimately about finding the right balance between flexibility and standardization according to Jill. Much like many decisions in data mesh, it isn't black or white but it's about finding the happy balance somewhere on the spectrum of grays. Having simple tagging and taxonomy has allowed them to see how something like marketing materials - e.g. postcards and flyers - has evolved over time without having to manually connect the data each time - when a new product is launched, it is added to the tagging and product hierarchy.Jill also believes that good tagging and taxonomy remove - and prevent - tech debt. It means you have an easier time generating new insights without having to do manual stitching on a one-off basis and it creates a much lower barrier to people leveraging the data from outside the domain - if they can understand the general taxonomy, they don't have to be as deep in the context to leverage the data effectively. And taxonomy can help more effectively share metadata to let other domains really understand what a domain is doing.What Jill and her team saw early in their data journey was that each team was doing a good job tracking their metrics but all the metrics were siloed. Even though they were run independently, obviously the organization needed information to be able to span the different marketing channels. So Jill and team started to really classify and say something like 'hey, it looks like you are doing what you call X and team B is doing something they call Y but we'll map it to Z.' That way, it's far easier to look at a customer journey across domains and silos. You can tell much richer stories.Where Jill really sees the biggest value with taxonomies is at scale. To start out with, they are helpful but as you add more and more complexity to your data mesh implementation, instead of combining data by manually integrating data products with each other, you can quickly see historical trends across multiple domains to get a much bigger picture. The integration work, at least at the concept level, was done for you. But taxonomies are also not a magic bullet.A key strategic goal at Vista over the last few years has been personalization according to Jill. So being able to see the big picture of customer journeys and being able to intersect with them with the right offerings at the right time - that sounds like business nirvana. But if you can't see everything that is happening across your many offerings, is that really possible? Or are you preparing yourself with historical data for a potential ML model down the road? Getting the right metadata in place early, that early investment, set them up for value down the road.Jill talked about how after doing her PoC around what she was trying to do with taxonomy and metadata creation and curation, some lines of business weren't really ready to engage. But instead of putting this taxonomy project on hold, she looked for additional places where she could make progress for when those lines of business were ready, that the value creation at that stage wouldn't have missed all the necessary data in the meantime. Essentially, she looked to avoid the loop of 'it's not of immediate value but we'll get to it eventually' then six months down the road 'oh, if only we had this data over the last six months, then this would be of value.' Once what Jill and team implemented started to show early wins, they circled back with the business to say 'hey, this is generating value, here's how it saved you some effort', so the business people were more ready. And the teams they had helped already started to advocate for more of the work too. Which gave Jill's team proof points when going to execs and other stakeholders that investing time in metadata and taxonomy is driving value and their participation would drive more value for all.A key win for Vista according to Jill has been moving from a metric like how many emails were sent to someone in a certain period to what kind of emails - was it all promotional or were some of them based on an order? So getting more granular about the what has helped them reduce email opt-out rates for example. The next phase of that is to look at specific interactions a customer had and develop a next best action model - what can the company do to drive more business that is highly relevant? Again, that personalization. Jill gave a number of additional useful examples of what tagging and taxonomy is driving for them.Jill also discussed how there shouldn't be one overarching taxonomy that everything should adhere to - you want to look at data from multiple angles. At Vista, customers may be looking for different things - like a postcard versus a wedding invite - but for manufacturing, it all looks the same. And you again want to apply personalization so you don't ask someone if they'd like to create another wedding invite a year later… But, if they are a restaurant that orders menus every 3 months with a change in the season, you want to 1) send them promotions ahead of time and 2) if they are late in their order pattern, potentially escalate to do something additional.Especially when looking at taxonomies, Jill believes you can entice people to participate earlier by pointing out they will influence the overall choices more than those that come later. So they get more say. But, it's important to balance that as you bring on new teams: their voice matters and their feedback is important to continue to improve and evolve your taxonomies.Jill gave the example of building a home on why we need to be coordinated in how we generate our data, with things like a unique ID: do you want the drywaller to do their work before or after the electricians and plumbers have done theirs? So getting the coordination done upfront on how you will look to combine data, even if not perfect, can save you a fair amount of time, money, and headaches down the road.Most customer journeys are cross domain. Jill showed how the more you make domain data interoperable, the more insight you have into the customer that can drive better business with your organization as a whole - globally maximize customer value - instead of only trying to locally optimize value by domain. And many - most? - of your executives, their questions and desired insights are rarely only in one domain. So how are you working to really answer their questions and have that ability to answer cross domain questions?Jill talked about phase one versus phase two of a data mesh journey. In phase one, you are focused on creating data products to meet specific use cases and it's pretty easy to end up with some overlap. So when you get to looking at your data products as a full suite of data products, those overlaps look like extra cost. So early in your journey, make sure domains are communicating about what they are building to prevent doubled up work. And it will still probably happen and that's okay. But part of product thinking and product management is portfolio management.Quick tidbits:A good check for teams helping manage a data product portfolio is to take a few stakeholder questions and use what's available to try to answer them. How was the user experience? Is the information easy to understand? Etc. That will inform future platform improvements and new data products."When you're thinking about interoperability, it's just playing nice, right?" If you think of interoperability as a key part of your culture, it's far easier to implement. Let people know why interoperability is good for them and the whole company.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 6, 2022 • 26min

Weekly Episode Summaries and Programming Notes – Week of November 6, 2022

Nov 4, 2022 • 1h 23min

#150 3 Years in, Data Mesh at eDreams: Small Data Products, Consumer Burden, and Iterating to Success, Oh My! - Interview w/ Carlos Saona

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Carlos' LinkedIn: https://www.linkedin.com/in/carlos-saona-vazquez/In this episode, Scott interviewed Carlos Saona, Chief Architect at eDreams ODIGEO.As a caveat before jumping in, Carlos believes it's too hard to say their experience or learnings will apply to everyone or that he necessarily recommends anything they have done specifically but he has learned a lot of very interesting things to date. Keep that perspective in mind when reading this summary.Some key takeaways/thoughts from Carlos' point of view:eDreams' implementation is quite unique in that they were working on it without being in contact with other data mesh implementers for most of the last 3 years - until just recently. So they have learnings from non-typical approaches that are working for them.You should not look to create a single data model upfront. That's part of what has caused such an issue for the data warehouse - it's inflexible and doesn't really end up fitting needs. But you should look to iterate towards that standard model as you learn more and more about your use cases.?Controversial?: Look to push as much of the burden as is reasonable onto the data consumers. That means the stitching between data products, the compute costs of consuming, etc. They get the benefit so they should be taking on the burden. Things like data quality are still on the shoulders of producers.You should provide default values for your data product SLAs. It makes the discussion between consumers and producers far easier - is the default good enough or not??Extremely Controversial?: At eDreams, you cannot publish data in your data product that you are not generating. In derived domains (e.g., customer history), “generate” includes the derived stitching. NOTE: Go about an hour into the interview - not episode - for more specifics.When starting with data mesh, there must be a settling period - consumers must understand that things are subject to change while a new producer really figures things out for the first few weeks to months.You want to avoid duplicating data. But you REALLY want to avoid duplicating business logic.Be careful when selecting your initial data mesh use cases. If the use case requires a very fast time to market, while it has value, you likely won't have the time and space necessary to experiment and learn. You need to find repeatable patterns to scale in data mesh. Hurrying is a way to miss the necessary learning.Look ahead and build ahead for obvious interoperability. E.g. create foreign keys for data products that don't exist yet but will.Be clear about what early data mesh participation means - what will it net domains that are part of the early implementation? And be specific too about what your early implementation won't include or achieve. Don't over promise and under deliver.Similarly, strongly emphasize that learning is a priority in your early implementation and that you are factoring in learning into promises and estimations. You can't promise you'll find the right solution to a challenge on day one, things need space to evolve as you learn more.It's okay to not have everyone as part of your initial implementation - engagement or even buy-in - but set yourself on a path where their participation is inevitable.Making data as a first class citizen doesn't just happen. There is incremental work to be done by the domains. Make sure you reserve time to actually do that work - the data quanta creation and maintenance.It is not feasible to have your documentation be fully self-describing for everyone. eDreams chose to set the bar at documentation that is self-describing for readers that already know about the domain. For readers that do not know the domain, that introduction must happen somewhere else. At the start of a data mesh journey, your central team will likely control all the use cases being served by the mesh. But at some point, self-serve needs to happen. Consumers need to be able to serve their needs without their use cases going through the central team.With small-size data products and the data combination burden being on consumers, versioning tends to not be as much of a problem because the concepts of domain events don't change that often. Or when they do, the retention window has been typically short, making versioning easier.To make data producers feel a better sense of ownership, 1) look for ways for producers to better leverage their own data; 2) maximize your number of consumers for their data quanta so there is a quicker time to identify issues with the data product - more eyes means more who can spot issues; and 3) create automation to easily/quickly let domains identify sources of data loss rather than searching: with proper setup, you can make it easy to identify if the data pipeline is the problem. If it's not, then the issue is in the domain.When Carlos and team were looking at building out how to tackle their growing data challenges a few years ago, they were looking at request for proposals (RFPs) from a number of data consultancies around building out a data lake but just were not convinced it would work. Then they ran across Zhamak's first data mesh article and decided to give it a try themselves. Until more recently, Carlos was not aware of the mass upswing in hype and buzz around data mesh so their implementation is very interesting because it wasn't really influenced by other implementations.When they were starting out, Carlos said they didn't want to try to create a single, overarching approach. It was very much about finding how to do data mesh incrementally. They started use case by use case and built it out organically, including the design principles and rules - they knew they couldn't start with a single data model for instance. But it was quite challenging iterating towards that standard data model.When choosing their initial use cases to try for data mesh, Carlos and team had some specific criteria. They rejected anything that needed a very quick turnaround because it wouldn't let them have enough time/space to try things, learn, and iterate. They did plan ahead by creating foreign keys to data products that didn't exist to make interoperability down the road when they would exist easier. And they were very honest with stakeholders about what early participation meant - and what it didn't mean; that way, it was clear what benefits stakeholders could expect.According to Carlos, while they had executive support and sponsorship for data mesh, that wasn't enough to move forward with confidence at the start. They needed to have a few key stakeholders that were engaged as well and wanted to participate. It was also okay to have some stakeholders not engaged but just informed of what they were trying to do with data mesh. You don't have to win everyone over before starting.Five things Carlos thinks others embarking on a data mesh journey should really take from their learnings: 1) it's okay to not have everyone really bought in or especially engaged upfront but they will have to participate - make their eventual participation inevitable. 2) Really emphasize that you are learning in your early journey, not that you have it figured out - and factor in learning when doing estimations and promises. 3) Don't try to design your data model from the beginning; you need to learn via iteration - you will start to find your standards to make it easy to design new data products. 4) When treating data as a first class citizen, it's important to understand that will take additional time. Reserve the team's time to create and maintain their data quanta. 5) Let the use cases drive you forward and show you where to go.Carlos' philosophy is, within reason push as much of the burden onto the consumer as you can. Obviously, we don't want consumers doing the data cleansing work - that's been one of the key issues with the data lake - but the costs of consumption should fall on the data consumers as they are the ones deriving the most benefit. So eDreams makes the consumers own stitching data products together for their queries and makes them pay for the consumption. This minimizes the costs - including maintenance costs - to producers.One very interesting and somewhat unique - at least as far as Scott has seen - approach is how truly small Carlos and team's data quanta are. Thus far, they have really adhered to the concept that each data quantum should only be about sharing a single type of domain event and really nothing more in it. This again makes for lower complexity and maintenance costs for data producers. They are considering changes with upcoming BI-focused data products so that is to be determined.Carlos believes - and Scott exceedingly strongly agrees - it is not feasible for your documentation for your data quanta to be fully self-describing. You can't know someone else's context. You need to write good documentation so people can still understand what the data product is and what it's trying to share but if you do not have knowledge of the domain, it would be a considerable amount of effort - essentially impossible to do it right - to fully explain the domain and how it works in the documentation of each data product. Getting to know how other domains exactly work is outside of the scope of the data mesh. At the start of their journey, the data team was in control of all the use cases, who was consuming, and who was producing, according to Carlos. But, as they've gone wider and there is a self-service model for data consumers, more and more of the use cases are directly between the producers and consumers - or the consumers are consuming without much interaction with producers if they already know the domain. It could become an issue with people trying to understand data from lots of different domains for the sake of understanding but it hasn't been an issue so far.To date, Carlos hasn't seen many problems around versioning. They thought they would have many more issues with versioning than they have which Carlos believes is from keeping their data products as small as possible and using domain events. When they have had versioning, the retention window for the data has been relatively short so the versioning has been relatively simple to move to the newer version. And because most people are getting their data from source-aligned data products, changes have a smaller blast radius - they won't affect data products that are downstream of a downstream of a downstream data product. Domain events have been enough because their main stakeholder has been machine learning. They are now working on a different kind of data quanta for consumers such as BI, and they plan to include more governed versioning there.One of the biggest challenges early on according to Carlos was that domains didn't really feel the ownership over the data they shared. So to increase the feeling of ownership, they first looked for ways for producing domains to use their own data - as many other guests have mentioned. Second, they tried to maximize additional consumers of data products by looking for use cases. That led to faster feedback loops if there was a problem - more eyes on the data - so producers discovered issues sooner. And third, the platform team helped identify issues that might be in the system or in the data platform/pipeline process - if there was data loss, there is automation to help identify if it is on the platform side; if it's not on the platform side, then it is an issue with the domain. That one automation has led to a lot less time searching for the cause of data loss rather than fixing data loss.Carlos and team built in a few different layers of governance. The first is a universal layer for standard metadata in each data product, like when something happened, who is the owner, the version of the schema, the existence of a schema, etc. These are enforced automatically by the data platform and you can't put a data product on the mesh without complying. Producers must also tag any PII or sensitive information like credit cards. Then, a second layer is policies for data contracts between producers and consumers. As many guests have suggested, they have found having default values for SLAs in data contracts provides a great starting point for discussions between data producers and consumers."You can have your cake and eat it too," using domain events per Carlos. You don't want direct operational path queries hitting your data quanta as they are designed for analytical queries - they will have a separate latency profile. But at eDreams, the pipeline that writes data quanta to the analytical repository is implemented with streams that can be consumed in real-time by operational consumers (microservices).Other tidbits:When launching a new data product, there must be a settling period - consumers must understand that things are subject to change while the producer really figures things out.You want to avoid duplicating data. But you REALLY want to avoid duplicating business logic.Data products should have customized SLAs based on use cases. You don't need to optimize for everything. Let the needs drive the SLAs.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 1, 2022 • 12min

#149 Data Mesh and Community: The Future? - Mesh Musings 34

Oct 31, 2022 • 1h 19min

#148 It's A-Okay to Solve for Today: ANZ Plus's Early Data Mesh Success - Interview w/ Adelle McDonald

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Adelle's LinkedIn: https://www.linkedin.com/in/adelle-mcdonald-79a9a2139/In this episode, Scott interviewed Adelle McDonald, Customer and Origination Lead at ANZ Plus, a bank in Australia and New Zealand.Some key takeaways/thoughts from Adelle's point of view:To drive buy-in, the 1:1 conversations with domain owners - the business leaders - you will have to tailor your conversations to each person. Listen to their pain and reflect it back to them.Focus on an ability to quickly pivot with low cost. That can mean things aren't as product-worthy to start but it means you can evolve towards value more quickly.Addressing domain owners' pain points gets them looking at you as a partner. They will be much more willing to work with you, especially as you partner to provide actionable insights.?Controversial?: ANZ Plus is embedding data leads into domains to handle the data quanta for the domain and also build the team what they need from data. As part of that, they are slowly building up the domains' capabilities to handle their own data. This minimizes friction and creates buy-in but is likely not long-term sustainable - ownership will need to be transferred.Very important to tie the data quanta to use cases - driving value for users means focusing on use cases.Developers or software engineers owning data is complicated. Make it so they can start to make small changes and learn in a safe way instead of dumping all ownership on them at once. Ownership and knowledge aren't a switch you flip.Using a git-based, pull request approach, developers can attempt data work without manual stitching so they learn to do the work themselves; but it can still be easily overseen by someone with more data expertise.One way to potentially drive executive buy-in is joint/collaborative KPIs. So it's not just about their domain's results but how well they work and drive results with another domain.?Controversial?: It's okay to have a data asset with murky long-term ownership at first. If usage picks up, you want to convert it to a proper data quantum but we need to be able to test the waters with things and see if they are actually useful first. Clarity comes with usage.When creating anything data related, use a software development lifecycle (SDLC) approach. Domains may create something exclusively internal to the domain but once you look to share externally, you have rules and standards and best practices. Move from the pipeline approach to the software approach to data.Automatically generated documentation can considerably help with governance. You have it in the same repo and it can handle a large part of explaining what is happening with the data to make other decisions easier.Automate your governance checklist as much as possible so you prevent the manual work of governance. No gates, simple checks, that's a winning, low friction way to govern.When you don't automate your governance checklist, domains often feel they need to invent or buy the tooling to comply with governance. Making it just no friction to check through automation means far less complexity and fewer issues.Adelle started the conversation on driving buy-in and how important it is to tailor your message. As prior guests have also noted, the easiest way to drive buy-in is by helping out the person you are trying to get bought in as part of the process. So find their needs and help drive to a positive outcome for them with data first. When it comes to getting buy-in from domain owners, Adelle has seen finding their pain points and finding good ways to address those pain points will get them to see you as partners to leveraging their data. They will be much more willing to work with you rather than you simply putting new responsibilities on their plate. You can work with them to ensure their information - especially purchased data - is providing value and gives people actionable insights, not just interesting insights. It may be a tough pill to swallow but you need them to see you as that partner in the long run.At ANZ Plus, they are embedding data leads into the domains to be the main point of data contact to external domains, according to Adelle. Those data leads are serving the domains by helping to really address their internal business needs with data while also creating the data sharing mechanisms - the actual data quanta in data mesh terms - for sharing that domain's data across the rest of the organization. With this work falling on the data leads, ANZ Plus is not generally asking for domains to take on too much responsibility relative to data - at least not yet. This minimizes the work the domains have to take on but still significantly accelerates the time to getting to business value within the domain for new data use cases. This means the domain owners have been very happy to work with the data leads because there isn't much incremental work they are responsible for - at least at first.Adelle and the data mesh fans/leaders at ANZ Plus are aware that their data ownership model is probably not the right fit in the long run. But it's working well for them right now and that's what matters to them. They have found a setup that doesn't add a ton of overhead process and will evolve as the capabilities and resources to hand over actual data ownership get built out more and more. When they evolve, they are focused on maintaining the ways of sharing context rather than trying to keep the exact data quanta as is. But if your organization isn't very clear that things will constantly evolve, this could be a very hard setup to maintain in the long run.As part of their plans for evolution, Adelle mentioned that they are focusing on maintaining the ability - with a low overall cost - to pivot. As the world around them changes and evolves and as they learn more by taking actions on the actionable insights they generate, to drive more value, the team needs to make sure they are evolving along with the world and their markets. They are focusing on getting to the right insights as fast as possible in sustainable ways.Thus far, Adelle and team are finding it's very important to tie use cases to data quanta. To drive value for customers, they need to focus on the use cases. This is especially relevant when looking at customer journeys - you want to set yourself up to collect the right data to analyze to understand what's going on with that customer journey.Adelle emphasized the need to create an environment for safe evolution by developers relative to data. At first, developers won't know how to deal with data in general as an entire concept but most will understand at least some aspects of working with data. So how can you get them more and more used to dealing with data and learning more? By providing them a way to make changes when they need in a safe and easy way. Easier said than done but teaching developers and software engineers to deal with data isn't a switch you flip, just like handing over ownership. It's best done as a gradual process. Easier said than done of course.According to Adelle, it's okay to have somewhat murky data ownership at first for a new potential data asset. If it starts to get broader use, you need to lock-in who will own it, how, and why but you don't need to get ahead of yourself and drive towards a perfect data quantum each time you look to share data. Have high context exchange with other users, let them know how much they can trust things, but also be in a mode of trying it out and seeing where it might go before investing the time into data quantum creation. This gives domains more freedom to play with their data but data consumers also must be flexible around what they are consuming might evolve as it gets molded into something more scalable, usable, trustable, etc. Without an ability to evolve quickly, this model will likely not work.It's crucial to think about data like any other software development for Adelle. Your software development lifecycle (SDLC) needs to have things like governance and API interfaces as part of the development. And, as an example on the operational side, domains can build small internal apps that aren't good for external domains as they figure them out and test. But once other domains need access, then you have to start treating your sharing access like a product. The same goes for data. You should have standard practices internally to make this low friction.Automated documentation has been a big win for governance according to Adelle. While you still need additional documentation, if you can have the base level documentation auto-generated in the repo with your data quantum code, that's very helpful for sharing what you are doing with data and why. It also means it's easier to make other governance decisions because people can see what is happening, what is the information about. They've automated much of their governance checklist as part of their software development lifecycle as well so people can test against if they meet governance requirements as they are developing, not as a gate at the end. A few tidbits: Only collect data for a specific reason. If you don't know specifically why you are collecting it, why are you collecting it?Making incremental data work requests a pull-based system (think pull requests in git), you can work somewhat asynchronously and developers can learn to attempt data work in a safe environment.Before investing in doing data work, ask if it is recurring and what is the value, why are we doing this. If it will be recurring work, look to automate as much as possible first. It might not yet be data quantum worthy but it sets you up for when that time comes. One way to potentially drive executive buy-in is joint/collaborative KPIs. So it's not just about their domain's results but how well they work and drive results with another domain.When you don't automate your governance checklist, domains often feel they need to invent or buy the tooling to comply. Making it just no friction to check against governance requirements, as part of the SDLC, means far less complexity and fewer issues.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Oct 30, 2022 • 28min

Weekly Episode Summaries and Programming Notes – Week of October 30, 2022

Oct 28, 2022 • 1h 4min

#147 Mapping Out Your Data Product Suite - Building Your Roadmap to Maximizing Business Value - Interview w/ Gunjan Aggarwal

Oct 25, 2022 • 15min

#146 False Dichotomies and Baseless Binary Choices - Why We Need New Thought Approaches in Data - Mesh Musings 33

Oct 24, 2022 • 1h 6min

#145 From Failwhale to Massive Scale and Beyond: Learnings on Fixing Data Team Bottlenecks - Interview w/ Dmitriy Ryaboy

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app