Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Jun 6, 2022 • 1h 10min

#85 The Move from Legacy to Leader in Data and Analytics - Interview w/ Immanuel Schweizer

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Immanuel's LinkedIn: https://www.linkedin.com/in/immanuel-schweizer-17839242/In this episode, Scott interviewed Immanuel Schweizer, the Data Officer for EMD Electronics.Some interesting thoughts and questions from the conversation:Good governance starts at data collection - what are ethical and compliant ways to collect data from the beginning? This points to intentionality around data use stretching into the application - what should you collect that might not be part of the day-to-day application function but that might might lead to generating insights that will be used to generate a better user experience? And what are the ethical concerns?Should we initially create data products to serve specific use cases or should we focus on sharing data first and then shaping what people consume most into data products? EMD is approaching data products from a different angle than most, using the second approach.When looking at data mesh, should you start with the high data maturity teams or work to pull everyone up to at least a decent baseline maturity level? If you work with the most mature teams, will their challenges really be applicable to the not-so-mature domains? Can you find good reuse patterns to scale your mesh implementation?Domain owners are much more willing to share data if they understand use cases for how their data will be used and maintain control to prevent misuse. Reluctance comes from an incomplete picture causing concerns - the more visibility into how data can be and is being used, the more willing domain owners are to share. But understanding your end-to-end data supply chain is tough, especially to start. How do you evaluate when to spend the time with a domain to get them data mesh ready? If you need a high value use case to justify spending time with that domain, are you leaving many domains behind? This ties to #2 and #3.Set your target picture but be ready to adjust your target picture along the way. The world is ever changing, don't lock in to an expected target outcome.Good data governance is about speeding up 1) access to and 2) usage of data. EMD launched a data literacy program where the employees spend the majority of a 10 week timeframe learning about data and how to make use of it. For Immanuel, making things tangible relative to data makes people much less hesitant to explore and use data.You should make using data a "part of the job" so it is tracked and part of the review process. Otherwise, you are missing out on a key incentive to leverage data.How many people in your organization wish they could be leveraging data more often to make decisions? What's holding them back? Is it tooling, knowledge, incentivization, access, etc.?How can we democratize insights? So much of insight generation is one-off, how do we make that scalable, shareable, and repeatable?Per Immanuel, EMD's data mesh journey is not that typical in that they are still getting their arms around centralizing data in a constructive way. It was previously locked away in the domains. So, they are starting their data mesh - or decentralization - journey by centralizing data in a certain sense. Wannes Rosiers mentioned this at DPG Media as well. This enables breaking down silos and starting from common ways of working so there is more cohesiveness around centralized data sharing. There are some concerns from the domains about how do they maintain control to ensure compliant usage. As the team learn how to put their data onto centralized infrastructure, Immanuel shared that they are simultaneously working on how they will hand more control back to domains. EMD is in the initial stages of their data maturity journey but they are mapping out how they plan to move forward with next steps. They are focused on giving data producers the visibility to how their data is used so the data domain owners can feel comfortable. And Immanuel gave a good insight to those starting their data mesh journey: understanding the end-to-end data supply chain is really hard when you are just establishing that supply chain. And understanding that supply chain, how data is consumed downstream, is very important to giving the domains the visibility they need to feel comfortable sharing more and more of their data. Some are calling this data on the threshold - data might be data on the outside but isn't yet.Immanuel and team are starting their data mesh journey by bringing a significant amount of data into a central data lake and then watching for data consumption patterns emerge. Then, when there is a use case that is worthwhile, that will get promoted to being data product worthy. Then they work backward to find the owners so they can ensure the upstream data production is actually managed as a product. Now, since data challenges aren't flagged to a centralized team, the data can be fixed where it should be fixed - upstream, at the source systems.Based on that approach of getting access to data outside of data products, Immanuel mentioned how this could bring domains to the table sooner in a data mesh implementation. In his view, it is often quite expensive to get a domain on board and capable of sharing data like a product. So, if you require a specific, data consumer-driven use case or use cases before investing in that domain enough that they can share data, it can mean you only look for very large return use cases. Or that the initial cost of bringing a domain on to sharing on the mesh falls disproportionately on your early data products. In Immanuel's approach, the initial cost of sharing is much lower, and then use cases emerge to further justify work. So more domains share sooner. But it has the drawback of the domain not sharing their data intentionally upfront so it might not have as much of the domain context.Immanuel mentioned that you can't centralize ownership of data quality and access and expect to scale. You really need to figure out how you distribute your data ownership appropriately. And that you can't rely on a data engineer in every domain, so how do you lower the bar to sharing?Immanuel's approach to the big picture is to set a target picture and a north star but adjusting the expected target picture along the way. If you aren't flexible and aren't taking in new information and adjusting accordingly, are you really ready for the flexibility required to do data mesh?So how is EMD approaching their data strategy? Per Immanuel, they started with a company-wide data strategy - again, setting that high-level target picture. Then, they started to measure data maturity across each domain. They had to answer should they move ahead with domains that are already high data maturity or try to level up the capabilities of less mature domains so they can participate in a data sharing economy. They decided to focus on bringing all domains up to a certain maturity level - if you don't, it can mean issues scaling your approaches, per Immanuel. An example is that if one domain is doing MLOps with 10s of ML models in production and another domain is running everything off Excel and Access - or worse, PowerPoint - their needs may be completely different. The things you learn and ways of working you get from the very data mature team just won't translate well to immature domains so every domain will need specialized help. That just isn't scalable.Immanuel mentioned how data governance can often create mixed feelings. He said that good data governance is like brakes in your car - brakes are there to allow you to go fast safely. Governance is not about overseeing every bit of data usage, it's about speeding up access and usage of data. Governance should be an enabling factor in a best-case scenario. And that is why federated governance can be so powerful - we give the control to the people most worried about data usage and give them the tooling/knowledge to own most of that governance.EMD created a general data literacy campaign, making everyone aware of and on the same page about definitions for a lot of core concepts around data, analytics, use cases, etc. - making them all aware of the vocabulary. This was just the first step though. For Immanuel, he has seen data literacy success come from programs that take people by the hand regarding data and make them use the systems - the workflows and mechanisms - especially around governance. Hence, they created a 10 week program for focusing on using data. He said it is looking successful because it makes things so much more tangible. People can understand the entire flow of data through the system and then they have a better idea of what they could do. It's only just wrapping up its first cohort of people but the attendees seem very excited about it.In Immanuel's view, people are naturally curious around data. The issue with using data has been lack of tools and access to data in domains - sharing and leveraging data wasn't viewed as part of the typical job in most domains and it wasn't part of most people's KPIs or reviews. It is important to give them the right tools and the right incentives so people can and want to explore data and be curious. Pushing reports at people doesn't engage curiosity. Who is this new data and analytics world for? For Immanuel, it's for the people who are already data curious but aren't leveraging data nearly as much and/or as scalably as they could be - we need to give them the tools to make working with data more scalable and sharable. Then, data becomes a topic that is tangible for everyone in the business - can we start replacing Excel and PowerPoint but still make it simple for people to explore the data?Per Immanuel, part of moving towards scalable analytics requires us to unlock the one-off insight generation, make it so we can democratize the generated insights. There are too many instances of people generating good and useful insights that are just lost, only seen by that single person. And/or only seen that one time. How do we make it so there is an easy, happy path to sharing insights that are long-lived?When asked about the initial ROI on data mesh and the big data literacy campaign to-date, Immanuel pointed to the main value thus far has been giving people the capability and encouragement to explore data. Exploration has allowed people to understand the organization and they've found the most reused data - and then focused their data work efforts on the places people are using the most. That prevents time spent on data products that aren't valued by consumers.In wrapping up, a question Immanuel thinks it's important to ask: how do you define a happy place for your org and for each domain? No journey to a happy place will look alike. And no happy place will look alike either. What does a good state along the way look like? We don't need to be in a rush to get to the finish line.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 5, 2022 • 29min

Weekly Episode Summaries and Programming Notes - Week of June 5, 2022 - Data Mesh Radio

Jun 3, 2022 • 1h 9min

#84 Driving to Better Data Outcomes with Collaborative Questions - Interview w/ Jean-Michel Coeur

May 31, 2022 • 10min

#83 What is Your Actual Data TCO - Mesh Musings 17

6 snips

May 30, 2022 • 1h 11min

#82 A Better Way to Map Domains? & Searching "For" Data, Not Just "In" Data - Interview w/ Ole Olesen-Bagneux

In this episode, Scott interviewed Ole Olesen-Bagneux, an Enterprise Architect who focuses on data at GN and the author of an upcoming book on data catalogs with O'Reilly. The two main topics were: 1) how can we better understand and handle data domains, and 2) searching for data rather than just in data.

May 29, 2022 • 25min

Weekly Episode Summaries and Programming Notes - Week of May 29, 2022 - Data Mesh Radio

May 27, 2022 • 1h 27min

#81 Finding Useful and Repeatable Patterns for Data - Interview w/ Shane Gibson

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Shane's LinkedIn: https://www.linkedin.com/in/shagility/Shane's Twitter: @shagility / https://twitter.com/shagilityAgileData.io website: https://agiledata.io/AgileData Way of Working: https://wow.agiledata.io/Shane's Podcasts: https://agiledata.io/podcasts/In this episode, Scott interviewed Shane Gibson, CPO/Co-Founder of AgileData.io and Agile Data Coach. A few takeaways from Shane to start:- Agile methodology is about finding patterns that might work, trying them out and deciding to iterate or toss out the pattern. It's going to be hard to directly apply software engineering patterns to data but we should look for inspiration there and then tweak them.- Any time you look at a pattern you might want to adopt or evaluate if a pattern is working for you, ask yourself: will this/does this empower the team to work more effectively?- Applying patterns is a bit of a squishy business. Get comfortable that you won't be able to exactly measure if something is working. But also have an end goal in mind for adopting a pattern - what are you trying to achieve and is this pattern likely to help you achieve that?- Share your patterns to not only help others but to get feedback and maybe ideas to iterate your pattern further.Shane's last 8 years have been about taking Agile practices and patterns and applying them to data as an Agile Data Coach. And those patterns required a lot of tweaks to make them work for data. A big learning from that work is that when applying patterns in Agile in general, and specifically in data, each organization - even each team - needs to test and tweak/iterate on patterns. And that patterns can start valuable, lose value, and then become valuable again. Shane gave the example of daily standups drive collaboration as a forcing function but then lose value when that collaboration becomes a standard team practice. If there is a disruption to the team where collaboration is no longer standard practice, daily standups could get value again. So how do we apply these Agile concepts to data?Currently, Shane sees no real patterns emerging in the data mesh space - it is quite early as patterns often take 5-8 years to develop and data mesh is maybe 12 months in to even moderately broad adoption and is such a wide practice area, there are many practice areas that patterns will need to cover. But, that lack of patterns makes it quite hard for even those who want to be on the leading edge of implementing data mesh instead of the true bleeding edge - having to invent everything yourself is taxing work! So we need companies to really take existing patterns, iterate on them, and then tell the world what worked and what didn't. If people aren't sharing patterns, that's going to make it hard to adopt data mesh for many organizations.Shane believes that it will likely be pretty hard for many organizations - or at least many parts of large organizations - to give application developers in the domain the responsibility of creating data products. If your domains aren't already quite technically capable in building software products, it's going to be very hard for them to really handle data needs. So looking at domains that are using large, out-of-the-box enterprise platforms or SaaS solutions instead of rolling their own software, will they really have the capability to manage data as a product? If their domains don't have the most complex of data, maybe? But if they do, are they really mature enough to handle it? A very valid question.To really be agile, using Agile methodologies, you need to first adopt the Agile mindset and not just patterns and practices, per Shane. Agile is really about experimenting with a pattern and either iterating to make it better or throwing it out. It's not about being precious. As mentioned earlier, you should also throw out patterns that were effective and aren't helping you any more. You need to do the same at the team and organizational level if you are going to successfully implement something like data mesh. Your teams and your organization are like a living, changing, evolving organism - treat them as such.A very important point Shane made is data mesh isn't solution - it needs to at most be a way of approaching your data and analytical challenges of organization but with a true purpose in mind. The purpose isn't implementing data mesh. The purpose is a business objective or challenge and data mesh is helping you tackle that. Also, data mesh is not the right solution for many organizations, especially smaller ones or ones that don't have highly complex data needs - those organizations should review data mesh and understand the principles and work towards them but your real challenge isn't the centralized team being a bottleneck so don't take on the pain of decentralizing to be hip and trendy.For those who haven't really dealt with Agile, a "fun" potential learning, per Shane, is that there isn't really a great pattern for measuring if a certain pattern is working. Proving how well something is working is pretty impossible so a large part of it is feel - we chose this pattern to improve collaboration. Do we believe our collaboration has improved? If yes, great, let's try to iterate and improve a bit more. If no or our collaboration has even gone down, get rid of it!Per Shane, when evaluating if you are effective in your Agile methodology, ask: does the organization empower this team to work effectively? You will probably need to look at this on a team by team basis and repeatedly ask this question over time. Trying to scale Agile to fit all teams in an organization is often an anti-pattern. And if you are in a hierarchical company, adopting Agile patterns alone is probably not going to change the way you work in the long-run, you need to break the hierarchies in some way.For Shane, there is a big question that data mesh has yet to answer: can we really move the data production and ownership to the application developers? He thinks if we look at DevOps and how developers took on the necessary work for testing and CI/CD, we can. But then the even bigger question is how. How can we map the language of what needs to get done to the software engineering semantics?For Shane, the idea of a proof of concept - PoC - is just broken. We need to rethink it entirely, especially for data mesh. What are you really trying to prove out? He believes there are typically two types of PoCs and most default to Type 1 when potential beneficiaries expect the output of Type 2. In Type 1 PoCs, you are out to prove a high-level hypothesis that has lots of uncertainty. It's about experimentation and doing it in a "quick and dirty" way that is not ready for production. But the output of Type 1 is all about proving out the hypothesis - not a production-ready result. Type 2 is a minimum viable product or minimum valuable product - what can we strip away from our end goal to get to something that can be used and is - mostly - productionalizeable. Literally what is the minimum that is viable? It is about proving the capability to deliver and delivering something of value sooner. So ask yourself, what are you really trying to prove in your PoC?Shane finished on three points:- Empower your teams to change the way they work- Stop vendor and methodology washing data mesh- Regarding data mesh, share what patterns you are trying to adopt, why you chose them, and what is working/not working. Data mesh can only evolve to something really great if we work together and share more information.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 24, 2022 • 12min

#80 A Data Mesh Prisoner's Dilemma? - Sharing with the Community - Mesh Musings 16

May 23, 2022 • 1h 12min

#79 A Data Success Secret Recipe: Comfort with Ambiguity and Change Management - Interview w/ Vincent Koc

May 22, 2022 • 26min

Weekly Episode Summaries and Programming Notes - Week of May 22, 2022 - Data Mesh Radio

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app