Data Mesh Radio cover image

Data Mesh Radio

Latest episodes

undefined
Nov 9, 2022 • 15min

#152 Zhamak's Corner 8 - Are We Using Tech Gold as a Paperweight?

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. What tech is already available that could be used for data mesh? There are so many amazing approaches and technologies in data but they've been used for the pipeline approach only. We need to think more like developers - not accepting the grunt work or death by a thousand cuts of data - and take a hard look at what we've done historically in data and what should be replaced. Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Nov 7, 2022 • 1h 25min

#151 Driving Interoperability via Taxonomies and Tagging to Power Personalization - Interview w/ Jill Maffeo

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Jill's LinkedIn: https://www.linkedin.com/in/jillianmaffeo/Developing Interoperable Channel Domain Data (blog post): https://vista.io/blog/developing-interoperable-channel-domain-dataIn this episode, Scott interviewed Jill Maffeo, Senior Data Product Manager at Vista.Before jumping in, Jill gives a lot of very useful examples of outcomes they've been able to drive that could be abstracted to apply to your own organization's business challenges. Outcomes like better customer segmentation, faster time to launch new offerings, etc. If you are having difficulty with stakeholder buy-in, especially for someone in marketing, this episode could help you frame things in their language.Some key takeaways/thoughts from Jill's point of view:"When you're thinking about interoperability, it's just playing nice, right?" If you think of interoperability as a key part of your culture, it's easier to implement. Let people know why interoperability is good for them and the whole company.Taxonomies help drive interoperability because there is already an established language even if things don't fit perfectly. New concepts can emerge and your taxonomies should change. But it makes the interoperability discussions have at least a common starting point.Taxonomies are a living thing - make sure they aren't overly rigid and be prepared to continually evolve and improve them.Within your taxonomy structure, if there is a reason for things to be unique for a domain or use case, that is okay. Look for potential ways to also convert that data to best fit your taxonomy but you don't want to force a square peg through a round hole.Taxonomies really start to add a lot of value at scale. They are somewhat costly upfront with likely moderate return on investment early; but, if you do them right, will pay back a lot as you move forward. They make historical analysis - especially with interoperability - far easier because you've done the work ahead of time.Taxonomy, when done well, is about balancing standardization and flexibility. Much like most things in data mesh, it's about finding the right balance for your organization.Most customer journeys are cross domain. The more you make domain data interoperable, the more insight you have into the customer that can drive better business with your organization as a whole instead of only trying to locally optimize value by domain.Similarly, many executive questions and desired insights are not exclusive to a domain. Can we get to answering their questions much more quickly and completely? What value would that drive?To get funding for longer-term initiatives with a payoff down the road like taxonomies, look to directly attach your work to exec's strategic goals. Make it easy for them to see why this will be of value to invest in now rather than later.If a line of business isn't ready to engage on a key long-term strategic goal like very rich metadata, look for places to make progress that you control. That way, you can capture more value from data being generated now so when you look to reengage, there is more incentive.When re-engaging with a team that declined to work with you the first time, look to use as much empathy as possible. Yes, they said no but now is the time to say welcome to the party and invite them in. No 'I told you so', only 'great, how can we help?'It's easy to lose sight of differentiation in metrics. With a good taxonomy and metadata strategy, you can differentiate better between actions - e.g. not how many emails did you send but what type and why? 5 marketing emails in a week is probably bad but 4 order-related emails and 1 marketing email isn't.It's okay - and probably advisable - to have your taxonomy and tagging be able to serve multiple use cases. A postcard and a wedding invite look the same to manufacturing but are marketed very differently. A potential way to entice people to participate earlier in a data mesh journey: let them know they will have more say and influence on the general choices. However, you still want domains that are just starting to participate to know that you want their feedback and that their voice also matters.Think about building interoperability like building a structure. You need everyone coordinating - you want the electrician and the plumber to do their work before the drywaller. Redoing the work instead of setting your build schedule in place doesn't sound like a good idea.Focus on getting your standards in place for interoperability. Scott Note: no one is publishing their standards and this NEEDS to be done. People need to share their standards explicitly as everyone is reinventing these standards.When picking early use cases, potentially look for complementary use cases where having information needed for each individual use case can power even further use cases.Good - even if only rudimentary - taxonomy and tagging lets you easily see how things have changed over time without having to manually stitch data. It also lowers the barrier to external domains leveraging your data.A key benefit of taxonomies is the ability to tell richer stories. You have data across many different domains or business outcomes but you can see how they interplay.?Controversial?: Standard tagging and taxonomy can remove - and even prevent - tech debt, partially because it prevents some manual stitching of data.A good check for teams helping manage a data product portfolio is to take a few stakeholder questions and use what's available to try to answer them. How was the user experience? Is the information easy to understand? Etc.Jill started off by talking about her current role where she manages a "team of products that span the gamut between data ingestion, data curation, metadata curation and creation, and also [the] upper funnel portfolio around measurement and insight from a demand perspective." So, essentially she is managing a suite of data products around marketing that covers a wide variety of needs, some more technical and behind the scenes and many on the front-line of powering analytics.Taxonomy can be a significant help to driving interoperability. Taxonomy is ultimately about finding the right balance between flexibility and standardization according to Jill. Much like many decisions in data mesh, it isn't black or white but it's about finding the happy balance somewhere on the spectrum of grays. Having simple tagging and taxonomy has allowed them to see how something like marketing materials - e.g. postcards and flyers - has evolved over time without having to manually connect the data each time - when a new product is launched, it is added to the tagging and product hierarchy.Jill also believes that good tagging and taxonomy remove - and prevent - tech debt. It means you have an easier time generating new insights without having to do manual stitching on a one-off basis and it creates a much lower barrier to people leveraging the data from outside the domain - if they can understand the general taxonomy, they don't have to be as deep in the context to leverage the data effectively. And taxonomy can help more effectively share metadata to let other domains really understand what a domain is doing.What Jill and her team saw early in their data journey was that each team was doing a good job tracking their metrics but all the metrics were siloed. Even though they were run independently, obviously the organization needed information to be able to span the different marketing channels. So Jill and team started to really classify and say something like 'hey, it looks like you are doing what you call X and team B is doing something they call Y but we'll map it to Z.' That way, it's far easier to look at a customer journey across domains and silos. You can tell much richer stories.Where Jill really sees the biggest value with taxonomies is at scale. To start out with, they are helpful but as you add more and more complexity to your data mesh implementation, instead of combining data by manually integrating data products with each other, you can quickly see historical trends across multiple domains to get a much bigger picture. The integration work, at least at the concept level, was done for you. But taxonomies are also not a magic bullet.A key strategic goal at Vista over the last few years has been personalization according to Jill. So being able to see the big picture of customer journeys and being able to intersect with them with the right offerings at the right time - that sounds like business nirvana. But if you can't see everything that is happening across your many offerings, is that really possible? Or are you preparing yourself with historical data for a potential ML model down the road? Getting the right metadata in place early, that early investment, set them up for value down the road.Jill talked about how after doing her PoC around what she was trying to do with taxonomy and metadata creation and curation, some lines of business weren't really ready to engage. But instead of putting this taxonomy project on hold, she looked for additional places where she could make progress for when those lines of business were ready, that the value creation at that stage wouldn't have missed all the necessary data in the meantime. Essentially, she looked to avoid the loop of 'it's not of immediate value but we'll get to it eventually' then six months down the road 'oh, if only we had this data over the last six months, then this would be of value.' Once what Jill and team implemented started to show early wins, they circled back with the business to say 'hey, this is generating value, here's how it saved you some effort', so the business people were more ready. And the teams they had helped already started to advocate for more of the work too. Which gave Jill's team proof points when going to execs and other stakeholders that investing time in metadata and taxonomy is driving value and their participation would drive more value for all.A key win for Vista according to Jill has been moving from a metric like how many emails were sent to someone in a certain period to what kind of emails - was it all promotional or were some of them based on an order? So getting more granular about the what has helped them reduce email opt-out rates for example. The next phase of that is to look at specific interactions a customer had and develop a next best action model - what can the company do to drive more business that is highly relevant? Again, that personalization. Jill gave a number of additional useful examples of what tagging and taxonomy is driving for them.Jill also discussed how there shouldn't be one overarching taxonomy that everything should adhere to - you want to look at data from multiple angles. At Vista, customers may be looking for different things - like a postcard versus a wedding invite - but for manufacturing, it all looks the same. And you again want to apply personalization so you don't ask someone if they'd like to create another wedding invite a year later… But, if they are a restaurant that orders menus every 3 months with a change in the season, you want to 1) send them promotions ahead of time and 2) if they are late in their order pattern, potentially escalate to do something additional.Especially when looking at taxonomies, Jill believes you can entice people to participate earlier by pointing out they will influence the overall choices more than those that come later. So they get more say. But, it's important to balance that as you bring on new teams: their voice matters and their feedback is important to continue to improve and evolve your taxonomies.Jill gave the example of building a home on why we need to be coordinated in how we generate our data, with things like a unique ID: do you want the drywaller to do their work before or after the electricians and plumbers have done theirs? So getting the coordination done upfront on how you will look to combine data, even if not perfect, can save you a fair amount of time, money, and headaches down the road.Most customer journeys are cross domain. Jill showed how the more you make domain data interoperable, the more insight you have into the customer that can drive better business with your organization as a whole - globally maximize customer value - instead of only trying to locally optimize value by domain. And many - most? - of your executives, their questions and desired insights are rarely only in one domain. So how are you working to really answer their questions and have that ability to answer cross domain questions?Jill talked about phase one versus phase two of a data mesh journey. In phase one, you are focused on creating data products to meet specific use cases and it's pretty easy to end up with some overlap. So when you get to looking at your data products as a full suite of data products, those overlaps look like extra cost. So early in your journey, make sure domains are communicating about what they are building to prevent doubled up work. And it will still probably happen and that's okay. But part of product thinking and product management is portfolio management.Quick tidbits:A good check for teams helping manage a data product portfolio is to take a few stakeholder questions and use what's available to try to answer them. How was the user experience? Is the information easy to understand? Etc. That will inform future platform improvements and new data products."When you're thinking about interoperability, it's just playing nice, right?" If you think of interoperability as a key part of your culture, it's far easier to implement. Let people know why interoperability is good for them and the whole company.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Nov 6, 2022 • 26min

Weekly Episode Summaries and Programming Notes – Week of November 6, 2022

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Nov 4, 2022 • 1h 23min

#150 3 Years in, Data Mesh at eDreams: Small Data Products, Consumer Burden, and Iterating to Success, Oh My! - Interview w/ Carlos Saona

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Carlos' LinkedIn: https://www.linkedin.com/in/carlos-saona-vazquez/In this episode, Scott interviewed Carlos Saona, Chief Architect at eDreams ODIGEO.As a caveat before jumping in, Carlos believes it's too hard to say their experience or learnings will apply to everyone or that he necessarily recommends anything they have done specifically but he has learned a lot of very interesting things to date. Keep that perspective in mind when reading this summary.Some key takeaways/thoughts from Carlos' point of view:eDreams' implementation is quite unique in that they were working on it without being in contact with other data mesh implementers for most of the last 3 years - until just recently. So they have learnings from non-typical approaches that are working for them.You should not look to create a single data model upfront. That's part of what has caused such an issue for the data warehouse - it's inflexible and doesn't really end up fitting needs. But you should look to iterate towards that standard model as you learn more and more about your use cases.?Controversial?: Look to push as much of the burden as is reasonable onto the data consumers. That means the stitching between data products, the compute costs of consuming, etc. They get the benefit so they should be taking on the burden. Things like data quality are still on the shoulders of producers.You should provide default values for your data product SLAs. It makes the discussion between consumers and producers far easier - is the default good enough or not??Extremely Controversial?: At eDreams, you cannot publish data in your data product that you are not generating. In derived domains (e.g., customer history), “generate” includes the derived stitching. NOTE: Go about an hour into the interview - not episode - for more specifics.When starting with data mesh, there must be a settling period - consumers must understand that things are subject to change while a new producer really figures things out for the first few weeks to months.You want to avoid duplicating data. But you REALLY want to avoid duplicating business logic.Be careful when selecting your initial data mesh use cases. If the use case requires a very fast time to market, while it has value, you likely won't have the time and space necessary to experiment and learn. You need to find repeatable patterns to scale in data mesh. Hurrying is a way to miss the necessary learning.Look ahead and build ahead for obvious interoperability. E.g. create foreign keys for data products that don't exist yet but will.Be clear about what early data mesh participation means - what will it net domains that are part of the early implementation? And be specific too about what your early implementation won't include or achieve. Don't over promise and under deliver.Similarly, strongly emphasize that learning is a priority in your early implementation and that you are factoring in learning into promises and estimations. You can't promise you'll find the right solution to a challenge on day one, things need space to evolve as you learn more.It's okay to not have everyone as part of your initial implementation - engagement or even buy-in - but set yourself on a path where their participation is inevitable.Making data as a first class citizen doesn't just happen. There is incremental work to be done by the domains. Make sure you reserve time to actually do that work - the data quanta creation and maintenance.It is not feasible to have your documentation be fully self-describing for everyone. eDreams chose to set the bar at documentation that is self-describing for readers that already know about the domain. For readers that do not know the domain, that introduction must happen somewhere else. At the start of a data mesh journey, your central team will likely control all the use cases being served by the mesh. But at some point, self-serve needs to happen. Consumers need to be able to serve their needs without their use cases going through the central team.With small-size data products and the data combination burden being on consumers, versioning tends to not be as much of a problem because the concepts of domain events don't change that often. Or when they do, the retention window has been typically short, making versioning easier.To make data producers feel a better sense of ownership, 1) look for ways for producers to better leverage their own data; 2) maximize your number of consumers for their data quanta so there is a quicker time to identify issues with the data product - more eyes means more who can spot issues; and 3) create automation to easily/quickly let domains identify sources of data loss rather than searching: with proper setup, you can make it easy to identify if the data pipeline is the problem. If it's not, then the issue is in the domain.When Carlos and team were looking at building out how to tackle their growing data challenges a few years ago, they were looking at request for proposals (RFPs) from a number of data consultancies around building out a data lake but just were not convinced it would work. Then they ran across Zhamak's first data mesh article and decided to give it a try themselves. Until more recently, Carlos was not aware of the mass upswing in hype and buzz around data mesh so their implementation is very interesting because it wasn't really influenced by other implementations.When they were starting out, Carlos said they didn't want to try to create a single, overarching approach. It was very much about finding how to do data mesh incrementally. They started use case by use case and built it out organically, including the design principles and rules - they knew they couldn't start with a single data model for instance. But it was quite challenging iterating towards that standard data model.When choosing their initial use cases to try for data mesh, Carlos and team had some specific criteria. They rejected anything that needed a very quick turnaround because it wouldn't let them have enough time/space to try things, learn, and iterate. They did plan ahead by creating foreign keys to data products that didn't exist to make interoperability down the road when they would exist easier. And they were very honest with stakeholders about what early participation meant - and what it didn't mean; that way, it was clear what benefits stakeholders could expect.According to Carlos, while they had executive support and sponsorship for data mesh, that wasn't enough to move forward with confidence at the start. They needed to have a few key stakeholders that were engaged as well and wanted to participate. It was also okay to have some stakeholders not engaged but just informed of what they were trying to do with data mesh. You don't have to win everyone over before starting.Five things Carlos thinks others embarking on a data mesh journey should really take from their learnings: 1) it's okay to not have everyone really bought in or especially engaged upfront but they will have to participate - make their eventual participation inevitable. 2) Really emphasize that you are learning in your early journey, not that you have it figured out - and factor in learning when doing estimations and promises. 3) Don't try to design your data model from the beginning; you need to learn via iteration - you will start to find your standards to make it easy to design new data products. 4) When treating data as a first class citizen, it's important to understand that will take additional time. Reserve the team's time to create and maintain their data quanta. 5) Let the use cases drive you forward and show you where to go.Carlos' philosophy is, within reason push as much of the burden onto the consumer as you can. Obviously, we don't want consumers doing the data cleansing work - that's been one of the key issues with the data lake - but the costs of consumption should fall on the data consumers as they are the ones deriving the most benefit. So eDreams makes the consumers own stitching data products together for their queries and makes them pay for the consumption. This minimizes the costs - including maintenance costs - to producers.One very interesting and somewhat unique - at least as far as Scott has seen - approach is how truly small Carlos and team's data quanta are. Thus far, they have really adhered to the concept that each data quantum should only be about sharing a single type of domain event and really nothing more in it. This again makes for lower complexity and maintenance costs for data producers. They are considering changes with upcoming BI-focused data products so that is to be determined.Carlos believes - and Scott exceedingly strongly agrees - it is not feasible for your documentation for your data quanta to be fully self-describing. You can't know someone else's context. You need to write good documentation so people can still understand what the data product is and what it's trying to share but if you do not have knowledge of the domain, it would be a considerable amount of effort - essentially impossible to do it right - to fully explain the domain and how it works in the documentation of each data product. Getting to know how other domains exactly work is outside of the scope of the data mesh. At the start of their journey, the data team was in control of all the use cases, who was consuming, and who was producing, according to Carlos. But, as they've gone wider and there is a self-service model for data consumers, more and more of the use cases are directly between the producers and consumers - or the consumers are consuming without much interaction with producers if they already know the domain. It could become an issue with people trying to understand data from lots of different domains for the sake of understanding but it hasn't been an issue so far.To date, Carlos hasn't seen many problems around versioning. They thought they would have many more issues with versioning than they have which Carlos believes is from keeping their data products as small as possible and using domain events. When they have had versioning, the retention window for the data has been relatively short so the versioning has been relatively simple to move to the newer version. And because most people are getting their data from source-aligned data products, changes have a smaller blast radius - they won't affect data products that are downstream of a downstream of a downstream data product. Domain events have been enough because their main stakeholder has been machine learning. They are now working on a different kind of data quanta for consumers such as BI, and they plan to include more governed versioning there.One of the biggest challenges early on according to Carlos was that domains didn't really feel the ownership over the data they shared. So to increase the feeling of ownership, they first looked for ways for producing domains to use their own data - as many other guests have mentioned. Second, they tried to maximize additional consumers of data products by looking for use cases. That led to faster feedback loops if there was a problem - more eyes on the data - so producers discovered issues sooner. And third, the platform team helped identify issues that might be in the system or in the data platform/pipeline process - if there was data loss, there is automation to help identify if it is on the platform side; if it's not on the platform side, then it is an issue with the domain. That one automation has led to a lot less time searching for the cause of data loss rather than fixing data loss.Carlos and team built in a few different layers of governance. The first is a universal layer for standard metadata in each data product, like when something happened, who is the owner, the version of the schema, the existence of a schema, etc. These are enforced automatically by the data platform and you can't put a data product on the mesh without complying. Producers must also tag any PII or sensitive information like credit cards. Then, a second layer is policies for data contracts between producers and consumers. As many guests have suggested, they have found having default values for SLAs in data contracts provides a great starting point for discussions between data producers and consumers."You can have your cake and eat it too," using domain events per Carlos. You don't want direct operational path queries hitting your data quanta as they are designed for analytical queries - they will have a separate latency profile. But at eDreams, the pipeline that writes data quanta to the analytical repository is implemented with streams that can be consumed in real-time by operational consumers (microservices).Other tidbits:When launching a new data product, there must be a settling period - consumers must understand that things are subject to change while the producer really figures things out.You want to avoid duplicating data. But you REALLY want to avoid duplicating business logic.Data products should have customized SLAs based on use cases. You don't need to optimize for everything. Let the needs drive the SLAs.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Nov 1, 2022 • 12min

#149 Data Mesh and Community: The Future? - Mesh Musings 34

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 31, 2022 • 1h 19min

#148 It's A-Okay to Solve for Today: ANZ Plus's Early Data Mesh Success - Interview w/ Adelle McDonald

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Adelle's LinkedIn: https://www.linkedin.com/in/adelle-mcdonald-79a9a2139/In this episode, Scott interviewed Adelle McDonald, Customer and Origination Lead at ANZ Plus, a bank in Australia and New Zealand.Some key takeaways/thoughts from Adelle's point of view:To drive buy-in, the 1:1 conversations with domain owners - the business leaders - you will have to tailor your conversations to each person. Listen to their pain and reflect it back to them.Focus on an ability to quickly pivot with low cost. That can mean things aren't as product-worthy to start but it means you can evolve towards value more quickly.Addressing domain owners' pain points gets them looking at you as a partner. They will be much more willing to work with you, especially as you partner to provide actionable insights.?Controversial?: ANZ Plus is embedding data leads into domains to handle the data quanta for the domain and also build the team what they need from data. As part of that, they are slowly building up the domains' capabilities to handle their own data. This minimizes friction and creates buy-in but is likely not long-term sustainable - ownership will need to be transferred.Very important to tie the data quanta to use cases - driving value for users means focusing on use cases.Developers or software engineers owning data is complicated. Make it so they can start to make small changes and learn in a safe way instead of dumping all ownership on them at once. Ownership and knowledge aren't a switch you flip.Using a git-based, pull request approach, developers can attempt data work without manual stitching so they learn to do the work themselves; but it can still be easily overseen by someone with more data expertise.One way to potentially drive executive buy-in is joint/collaborative KPIs. So it's not just about their domain's results but how well they work and drive results with another domain.?Controversial?: It's okay to have a data asset with murky long-term ownership at first. If usage picks up, you want to convert it to a proper data quantum but we need to be able to test the waters with things and see if they are actually useful first. Clarity comes with usage.When creating anything data related, use a software development lifecycle (SDLC) approach. Domains may create something exclusively internal to the domain but once you look to share externally, you have rules and standards and best practices. Move from the pipeline approach to the software approach to data.Automatically generated documentation can considerably help with governance. You have it in the same repo and it can handle a large part of explaining what is happening with the data to make other decisions easier.Automate your governance checklist as much as possible so you prevent the manual work of governance. No gates, simple checks, that's a winning, low friction way to govern.When you don't automate your governance checklist, domains often feel they need to invent or buy the tooling to comply with governance. Making it just no friction to check through automation means far less complexity and fewer issues.Adelle started the conversation on driving buy-in and how important it is to tailor your message. As prior guests have also noted, the easiest way to drive buy-in is by helping out the person you are trying to get bought in as part of the process. So find their needs and help drive to a positive outcome for them with data first. When it comes to getting buy-in from domain owners, Adelle has seen finding their pain points and finding good ways to address those pain points will get them to see you as partners to leveraging their data. They will be much more willing to work with you rather than you simply putting new responsibilities on their plate. You can work with them to ensure their information - especially purchased data - is providing value and gives people actionable insights, not just interesting insights. It may be a tough pill to swallow but you need them to see you as that partner in the long run.At ANZ Plus, they are embedding data leads into the domains to be the main point of data contact to external domains, according to Adelle. Those data leads are serving the domains by helping to really address their internal business needs with data while also creating the data sharing mechanisms - the actual data quanta in data mesh terms - for sharing that domain's data across the rest of the organization. With this work falling on the data leads, ANZ Plus is not generally asking for domains to take on too much responsibility relative to data - at least not yet. This minimizes the work the domains have to take on but still significantly accelerates the time to getting to business value within the domain for new data use cases. This means the domain owners have been very happy to work with the data leads because there isn't much incremental work they are responsible for - at least at first.Adelle and the data mesh fans/leaders at ANZ Plus are aware that their data ownership model is probably not the right fit in the long run. But it's working well for them right now and that's what matters to them. They have found a setup that doesn't add a ton of overhead process and will evolve as the capabilities and resources to hand over actual data ownership get built out more and more. When they evolve, they are focused on maintaining the ways of sharing context rather than trying to keep the exact data quanta as is. But if your organization isn't very clear that things will constantly evolve, this could be a very hard setup to maintain in the long run.As part of their plans for evolution, Adelle mentioned that they are focusing on maintaining the ability - with a low overall cost - to pivot. As the world around them changes and evolves and as they learn more by taking actions on the actionable insights they generate, to drive more value, the team needs to make sure they are evolving along with the world and their markets. They are focusing on getting to the right insights as fast as possible in sustainable ways.Thus far, Adelle and team are finding it's very important to tie use cases to data quanta. To drive value for customers, they need to focus on the use cases. This is especially relevant when looking at customer journeys - you want to set yourself up to collect the right data to analyze to understand what's going on with that customer journey.Adelle emphasized the need to create an environment for safe evolution by developers relative to data. At first, developers won't know how to deal with data in general as an entire concept but most will understand at least some aspects of working with data. So how can you get them more and more used to dealing with data and learning more? By providing them a way to make changes when they need in a safe and easy way. Easier said than done but teaching developers and software engineers to deal with data isn't a switch you flip, just like handing over ownership. It's best done as a gradual process. Easier said than done of course.According to Adelle, it's okay to have somewhat murky data ownership at first for a new potential data asset. If it starts to get broader use, you need to lock-in who will own it, how, and why but you don't need to get ahead of yourself and drive towards a perfect data quantum each time you look to share data. Have high context exchange with other users, let them know how much they can trust things, but also be in a mode of trying it out and seeing where it might go before investing the time into data quantum creation. This gives domains more freedom to play with their data but data consumers also must be flexible around what they are consuming might evolve as it gets molded into something more scalable, usable, trustable, etc. Without an ability to evolve quickly, this model will likely not work.It's crucial to think about data like any other software development for Adelle. Your software development lifecycle (SDLC) needs to have things like governance and API interfaces as part of the development. And, as an example on the operational side, domains can build small internal apps that aren't good for external domains as they figure them out and test. But once other domains need access, then you have to start treating your sharing access like a product. The same goes for data. You should have standard practices internally to make this low friction.Automated documentation has been a big win for governance according to Adelle. While you still need additional documentation, if you can have the base level documentation auto-generated in the repo with your data quantum code, that's very helpful for sharing what you are doing with data and why. It also means it's easier to make other governance decisions because people can see what is happening, what is the information about. They've automated much of their governance checklist as part of their software development lifecycle as well so people can test against if they meet governance requirements as they are developing, not as a gate at the end. A few tidbits: Only collect data for a specific reason. If you don't know specifically why you are collecting it, why are you collecting it?Making incremental data work requests a pull-based system (think pull requests in git), you can work somewhat asynchronously and developers can learn to attempt data work in a safe environment.Before investing in doing data work, ask if it is recurring and what is the value, why are we doing this. If it will be recurring work, look to automate as much as possible first. It might not yet be data quantum worthy but it sets you up for when that time comes. One way to potentially drive executive buy-in is joint/collaborative KPIs. So it's not just about their domain's results but how well they work and drive results with another domain.When you don't automate your governance checklist, domains often feel they need to invent or buy the tooling to comply. Making it just no friction to check against governance requirements, as part of the SDLC, means far less complexity and fewer issues.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 30, 2022 • 28min

Weekly Episode Summaries and Programming Notes – Week of October 30, 2022

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 28, 2022 • 1h 4min

#147 Mapping Out Your Data Product Suite - Building Your Roadmap to Maximizing Business Value - Interview w/ Gunjan Aggarwal

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Gunjan's LinkedIn: https://www.linkedin.com/in/gunjanaggarwal/Gunjan's Medium: https://gunjan-aggarwal.medium.com/In this episode, Scott interviewed Gunjan Aggarwal, Head, Digital Data Products and MarTech Strategy at Novartis. To be clear, she was only representing her own views on the episode.Some key takeaways/thoughts from Gunjan's point of view:Set your overall data product strategy - for when you are in stage 2, going wider with data mesh - earlier in your journey than many may think. It's easy to focus only on use cases instead of the bigger picture.Make sure to align early on who owns what - what are the clear boundaries between roles. Otherwise, with the amount of change data mesh drives, there will likely be unnecessary chaos. Get specific.Don't fall to the 'Data Field of Dreams' - "if you build it, they will come." Focus on building to actual problem statements. Involve people early, make them accountable, give them skin in the game and they will care."The more you ask why, the more clarity you will get." Really dig in deep into the reasoning for creating new data products or ARDs (analytics ready datasets). If we have this data product, what will it unlock for us?It's crucial to avoid the trap of building data products specifically to use cases. You must have the bigger picture in mind and focus on reusability instead of only solving one set of challenges. Can you extend an existing data product?Data people should have domain knowledge where possible. That way, they can push back on requirements that don't make economic sense, that don't maximize the return on investment.4 part approach to designing data products: 1) find clarity on the problem statement; 2) assess who are the personas that will benefit from it; 3) dig into what you already have available; and 4) focus on serving value to the problem statement in a way the persona can use.?Controversial?: scalability is more important than time to market when it comes to data products, especially as you develop a broader set of data products. Tech debt around scaling is hard to combat just as you are delivering strong value with the need for scale. And it limits additional use cases leveraging existing data products if they can't scale.Look to provide as many easy paths as possible for new data products. Templates, blueprints, standard schema, a global taxonomy, etc. They don't have to use them but they are great starting points.You need to be proactive in partnering with the business. Data people have historically waited for requests/requirements. That won't lead to fast feedback loops and quickly iterating towards value.It's easy to end up focusing on the single use case instead of the bigger picture. But this will likely result in business disruption from a half-baked product.Gunjan is in phase 2 of a data mesh implementation at the moment, the going wide phase. As part of that, she's looking at how do you create a suite of data products to serve the needs of a broader set of use cases and look at putting things in place to more easily serve more ad hoc querying. She recommends setting your data product strategy for what will be your long-term needs earlier in your journey than most might think - what is the real business strategy for your mesh as a bigger entity than just individual data products? How will they actually work together so 1 + 1 = 3?According to Gunjan, it is very important to clearly define boundaries and responsibilities for roles. It's easy to get confused about what is a data product owner versus a product manager, for example. Look to the RACI model for defining things clearly. If there is a lot of change and unclear responsibilities, that can cause lots of challenges and chaos. If you don't have alignment early, it's very easy to go wrong. So make sure you spend time before moving forward with data mesh to really focus and align on why are you doing this and who will own what. Start with the end goal in mind and march forward together.It's crucial to make sure you involve people early in every data product you develop in Gunjan's experience. If you build something for them instead of building it with them, they are far less likely to buy in. Make people accountable, that their data product "is their baby". Make them part of defining success for a data product and work with them to make sure they can scale it up when it succeeds.For Gunjan, when considering new data products, always start with asking why. Why do we think this will drive incremental value? Why is this the right time? Dig in layer by layer to understand why is this a good use of our time and what is our expected business value from doing it. It's easy to miss the forest for the trees. This method also makes finding reusability more likely. Why can't you use what is already built? And if there is a good answer, make sure to build your data products so they are reusable for other use cases down the line.On digging deeper into reusability and extensibility, when you look at new use cases, consider if you need new data products to support it or if you can modify and extend existing data products instead. It's quite easy to build data products to try to support each use case individually but it will quickly overwhelm your teams. Look at the greater whole for how you can support your needs with a suite of data products. Where possible, Gunjan believes it's best if your data people have domain knowledge - they can push back on the cost/benefit of choices far better than someone without the specific knowledge. Do you actually need real-time? What is the impact of different SLAs to the return and the cost to create/maintain? It's far easier to maximize return on investment if one person understands both sides of the equation. Gunjan and team have a four part approach to building out new data products: 1) "Find the clarity on the problem statement". What exactly is the use case you will be serving? 2) "Who are your personas you are serving and what will they require?" Get specific around who you are trying to serve. 3) "What is available?" Dig into what you already have and evaluate if what exists can serve the use case(s). 4) Focus on providing the value to serving the problem statement in a way the personas can benefit.In the long run, Gunjan believes the ability to scale up is more important than speed to deploying a data product. When you rush to create data products, you will inevitably create a lot of tech debt, especially to scaling when the time comes, so focus more on ability to scale. It might not feel like that at first but it's especially important when you have many data products. It will be very frustrating when you have additional use cases and cannot easily scale because you wanted to release your single data product 2 weeks earlier. And focus on prioritization as well. Some use cases will have to wait and that's okay.According to Gunjan, you need to get pretty close to your domain business partners. Embed yourself in more business meetings discussing use cases and problem statements. You shouldn't be waiting for requests, you should be extracting that information in regular discussions. Fast feedback cycles leading to fast iteration is crucial.There is no such thing as a future proof tech stack. So Gunjan recommends 1) accepting that and 2) preparing yourself for graceful evolution to meet needs. You should always be asking what are the risks to your platform and how can you mitigate them. Modularize your tech stack so you can easily add and/or replace when necessary.In wrapping up, Gunjan talked about the importance of not focusing on the single use case but how it plays into the bigger picture, the long-term. It's easy to go down that path of focusing on the single use case, especially as you start out, but it will cause disruption to the business from a half-baked product. Product thinking, not project thinking.Quick tidbit:Easy data product templates and a good centralized catalog are very important. Having a good catalog can help you identify gaps in your data product coverage as well.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 25, 2022 • 15min

#146 False Dichotomies and Baseless Binary Choices - Why We Need New Thought Approaches in Data - Mesh Musings 33

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 24, 2022 • 1h 6min

#145 From Failwhale to Massive Scale and Beyond: Learnings on Fixing Data Team Bottlenecks - Interview w/ Dmitriy Ryaboy

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Dmitriy's Twitter: @squarecog / https://twitter.com/squarecogThe Missing README book: https://themissingreadme.com/Building Evolutionary Architectures book: https://www.oreilly.com/library/view/building-evolutionary-architectures/9781491986356/In this episode, Scott interviewed Dmitriy Ryaboy, CTO at Zymergen and co-author of the book The Missing README.Some key takeaways/thoughts from Dmitriy's point of view:Organizational design and change management is "like a knife fight" - you are going to get cut but if you do it well, you can choose where you get cut. There is no perfect org, and there will be pain somewhere but you can influence what will hurt, and make it not life-threatening.There is too much separation between data engineering and software engineering. Data engineering is just a type of software engineering with a focus on dealing with data. We have to stop treating them like completely different practices.When communicating internally, always focus on telling people the why before you get to the how. If they don't get why you are doing it, they are far less likely to be motivated to address the issue or opportunity. This applies to getting teams to take ownership of the data they produce, but also to everything else. There is often a rush to use tech over talk. Conversation is a powerful tool and will set you up so your tools can help you address the challenges once people are aligned. Paving over challenges with tech will not go well.Build your data platform such that the central data platform team is unnecessary in conversations between data producers and data consumers. That way, your team won't become a bottleneck. Try to reduce cognitive load on the users - they shouldn't have to deeply understand the platform and its inner workings just to leverage it."Data debt is debt forever." You can certainly 'pay it down' but data debt typically has a much longer life than even the initial source system that supplied the data. Take it on consciously.Looking to hire or grow full-stack engineers for an ever-growing definition of stack (backend, frontend, security, ops, QA, UX, data...) is probably not a great idea, we can't keep piling new domains on people and expect them to be good at all of them. Instead, look to build full-stack teams, and tools that look and feel sufficiently similar that e.g. "data engineering" becomes very close to "backend engineering".Look for needless delays in work as a sign your organization isn't well aligned with what you are trying to accomplish. The cost of coordination should not be a huge bottleneck, especially at a smaller size organization.Any agreement like a data contract or API needs to be agreed to by both parties. Consumers can't just expect things to not change, especially if they don't let producers know what, why, and how they are consuming their data.Dmitriy shared some of his experience leading the team building and scaling Twitter's initial data platform. They had to transform from a relatively simple model that just wasn’t scaling to one that was far more scalable but obviously more complex. E.g. their MySQL setup, they couldn't add any more columns so they treated certain columns almost as a CSV within the cell. You can imagine how difficult that was for analytics… They also had many hidden dependencies as well so when changes were made, it would break all kinds of other systems. So part of his role was helping the team untangle that mess.One thing they got right according to Dmitriy was moving to structured logging so there were at least standard columns that made it possible to combine data across applications. Dmitriy didn't have any specific advice on exactly how to standardize but mostly about the discussions around standardizing - always share what you are trying to accomplish, focus on the why much more than the how. The how is based on decisions once you've realized the why. This advice came up throughout the conversationTwitter had a small central data team that had to either try to scale massively and still not be able to know all the business context necessary or - more sanely - build their data platform to keep themselves from being a bottleneck. They chose the second option. So when building their data platform, there was a focus on building tools to make the central data team unnecessary to most data conversations, getting the data producers and consumers to talk to each other instead of playing interpreter and doing the work themselves.So, to sum up some advice from what Dmitriy learned from Twitter: 1) go for structured logging, 2) make dependencies clear and known, and also limit where possible, and 3) build your data platform tools so your data team doesn't have to be part of the conversations between data producers and consumers to enable scaling.According to Dmitriy, when focusing on change management, again you have to focus on explaining to people the why of what you are doing. When you give people the reason, the pain you are trying to solve, most people then just want to know what you need from them in his experience. And be prepared to reiterate the why a lot. It's better to state it too often than not often enough. When asked what we can take from software engineering to apply to data mesh and data engineering, Dmitriy is very passionate that data engineering and software engineering really shouldn't be overly separate. Data engineering is just a type of software engineering and we need to create the tooling to make software engineers capable of doing data engineering tasks. We can take a lot of learnings from software engineering and apply them to data but it shouldn't be seen as very different.When thinking about data mesh especially, we are asking domains to pick up additional responsibilities. Dmitriy believes - and Scott has said it multiple times too - you can't give a team more responsibility without giving them more resources. That can be people - especially with incremental capabilities - or something like a platform that reduces the cognitive load and workload of the new - and existing - responsibilities. Give people a platform that they don't need to be experts in what it is doing under the hood to actually leverage. And full-stack engineers, especially if we add dealing with data, are just going to be overloaded. Look to full-stack teams where people within the team have some specialty areas.Dmitriy shared some thoughts about how Zymergen's data capabilities and team has evolved. It was a central team that had ownership over everything which hampered the teams from talking to each other as much as needed. That introduced needless delays because the coordination cost was so high. So they reorganized and had a number of the data engineers move into the domains. Those data engineers were better able to leverage the data platform because they had built it and could teach other domain team members how to leverage it better too.A lot of issues can be handled with just a conversation, not a technical solution, according to Dmitriy. This might seem obvious but so many people try to not go down this road. But technology is obviously also important. Think about when to use tech and when to use talk. Your team should build out the platform to make it easy to have better conversations because the tech is handled."Data debt is debt forever," according to Dmitriy. Meaning that data has a habit of sticking around for a very long time, even well past when the initial source system has been replaced. You can pay down that data debt, but you have to address it intentionally. A simple refactor usually won't do it. Right now, it's far easier to update an API, we need to get there with data where things like versioning aren't such a manual, pain-in-the-butt task.Dmitriy ended with a few things: 1) indirection is your friend but don't use indirection until you need it; 2) a good place to learn about building your architecture to be able to evolve is the book Building Evolutionary Architectures (kind of well titled, huh?); and 3) organizational design and change management is like a knife fight - you are going to get cut but if you do it well, if you are a pro, you will choose where you get cut.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app