Data Mesh Radio cover image

Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Jun 11, 2023 • 17min

Weekly Episode Summaries and Programming Notes – Week of June 11, 2023

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 9, 2023 • 13min

#229 Making 'Agile' Work in Data - Mesh Musings 49

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts (most interviews from #32 on) hereProvided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.My overall point here is that why do so many folks in data hate Agile? Because in data it's so rarely done well basically. Things are done because 'that is the way they are supposed to be done' instead of 'because this will make our teams happier and more efficient'. And quite honestly, Agile isn't for every organization. The spirit of Agile probably should be for every organization so maybe go read the Agile manifesto but in data, the one size fits all approaches are obviously breaking more and more. So work with your teams and talk about what you want to achieve and collaborate with them to get there. Yes, easier said than done but I believe in you.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 5, 2023 • 1h 15min

#228 Keeping Your Eyes on the Prize: The Data Value Chain - Interview w/ Tina Albrecht

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Tina's LinkedIn: https://www.linkedin.com/in/christina-albrecht-69a6833a/In this episode, Scott interviewed Tina Albrecht, Lead Coach for Data-Driven Transformation at Exxeta.Some key takeaways/thoughts from Tina's point of view:Always start from your value chain - how do you actually generate value from data work? Any process or other tool you attempt to leverage that isn't focused on improving your data value chains will likely be ineffective in generating value. And why do data work if not to generate value?Your two most likely reasons you are losing value in your value chain are lack of clear ownership/responsibility and bottlenecks. Look to regularly assess both.When measuring if things are good enough, generally the DORA KPIs are good measures of data process maturity. But also look at two aspects: A) how happy are people (from customers, decision takers up to the team) with the current process. Satisfaction is a great measuring stick because it is highly correlated to effectiveness. And B) how much effectiveness is lost to bottlenecks and constraints.The two ways most data mesh implementations seem to be going wrong are a misinterpretation of Team Topologies and lack of teams owning responsibilities. On the first, there are often breakdowns in how teams collaborate together and on the second, we need the platform team to own enabling domains but the domains keep trying to push work back to the central platform team.It's important to regularly assess if aspects of your data transformation are good enough for now. But it's also very important - and easy to lose sight of - how are your teams feeling during the transformation. If you significantly improve capabilities but everyone is miserable, will they leverage those new capabilities? Scott note: interesting perspective and it factors into the rate of transformative change: constant seismic shifts are probably not great for morale.In data mesh, you should consider rotating your data engineers between domains so you can have a larger group of data capable people that truly understand domain challenges. Scott note: we have to still make sure we give embedded data engineers a career trajectory or they will leave.Many domains will truly not understand what data ownership really entails. Work with them and embed highly data fluent people that can raise the domain's capability and understanding of data ownership.!Controversial!: Just having a simply community or guild for your data engineers and having them become part of the domain is not enough. There should be a central organizational home for data engineers and you should embed them into domains but the central team should handle their career trajectory.Team Topologies is very helpful in data mesh but if you don't understand your actual value chain, it won't work well. You need to start from how you generate value.When considering if data mesh is right for an organization, if the organization doesn't have a clear vision of what would be better if they had better data capabilities, that's a major red flag. Do they understand how data can drive value for the organization specifically? Would data mesh align to their business strategy?Clarity - getting deep into how things actually interconnect - is crucial to doing a large scale transformation like data mesh right. Too often, things are left up to vague interpretation and balls get dropped. Drive to specifics.?Controversial?: Overcommunicate. Set literal mandatory information exchange mechanisms - e.g. at the person-to-person level in workshops - between domains. Get explicit that domains need to be regularly exchanging information, not just the 1s and 0s of data. Scott note: this is an emerging pattern from teams who have cleaned up problem data mesh implementations. It seems like it really works.?Controversial?: When trying to change your data processes, confusion is actually a positive sign. It indicates people are listening and trying to figure it out. If everyone thinks they understand all the changes without any confusion, they probably have wrong implicit assumptions. Scott note: human communication is silly sometimes but this is an excellent point.While it can be hard to exactly measure the value of data, you should always be asking what is the purpose of the data or data work. What value does this provide, why are we doing this work?Tina started with some of the big questions you will need to consistently ask yourself in a data transformation: is this good enough? Is this good enough for now? What is changing and how is it changing? What are the milestones we want to hit and, looking back, that we have hit? And often overlooked: how is/was the team feeling during the transformation? Those questions can help you narrow in on how well your transformation is going - and let's face it, data transformation doesn't stop. A good three layer system to think about when breaking this conversation down is: the surface layer of what happened in the past and where things are now; one layer deeper is where could they go if they had stronger data capabilities; and the bottom layer being okay, but how can we get there?When actually measuring whether a current solution is "good enough", Tina recommends two measures or questions to consider. The first is: how happy are people with the current process? Happiness is a decent measure of effectiveness. If people are happy, working hard to improve the process probably doesn't make financial sense, the return isn't there. The other aspect is to think about how much effectiveness of the process is lost to constraints and bottlenecks. This analysis will give you a good insight into the process and provide the perspective on if it is currently good enough.Tina talked about two ways most data mesh implementations seem to be going wrong. The first is how teams are interacting with each other: the team and domain setup. There are often breakdowns in how teams collaborate together so things that should be explicitly owned and bridged between the teams just keep getting dropped. The other is data ownership where domains won't take data ownership and/or don't understand data ownership. We need the platform team to own enabling domains to leverage the platform but the domains keep trying to push work back to the central platform team or not owning data well enough so the central team has to step in to help. If that happens, the platform team can't get the necessary platform work done to do data mesh and they become that centralized bottleneck again.In data mesh, Tina believes there is still a lack of understanding by domains of what data ownership means, what they are actually supposed to be responsible for. You can help domains better understand data ownership by making sure they have necessary embedded data engineering talent within the domain to actually be capable of owning data as more domain members learn how to own data. And you need strong governance capabilities to help teams understand how to interoperate data between domains easily.Tina talked about with one client, they are rotating embedded data engineers between domains - and the central platform team - so the domains become more data capable and you have a wider knowledge base about data across the organization as the data engineers share that knowledge with each other and the organization. And just having a simple community or guild for the data engineers wasn't enough, they needed to go with the embedded model instead. That central hub and a central team managing the data engineers' careers has been very important to keeping people happy.Similarly, Tina talked about how while Team Topologies is a great tool for organizing your teams in data mesh, it's only a tool. If you don't understand your value chain, if you don't really focus on how you create value via data, it won't save your data mesh implementation. Start from value first.Is data mesh right for your organization? When Tina is assessing that question for clients, a great starting question is simply what changes, what value would come from doing data mesh? If there isn't a clear vision as to what would be better and how that would drive clear value - and a large amount of value too, data mesh is not a light undertaking - then will data mesh really align with the business strategy and drive value? Scott note: I think these questions are REALLY crucial to answer. If you don't know what will change if you do this well and how that ties to business value, you shouldn't do it :)For Tina, in general in data work, there are two big areas where value is blocked or lost along the value chain. As she mentioned earlier, lack of clear ownership and responsibility is a big one. People understand how value is generated but it's unclear who owns what and so major needs along the value chain aren't met - basically, no one thinks they own crucial aspects and so they don't get done. The other is again simply bottlenecks - where are things blocked or where are dots not connected? Once you identify issues in either aspect, you should have your areas to target for change to drive more value related to those specific processes.A key aspect of transformation for Tina is deep clarity. There are so many things that are changing, who owns what and what outcomes do they own? What is actually being done and why? There needs to be strong governance leadership that lays out many aspects rather than leaving things to chance. It doesn't have to be heavy-handed governance but ensuring things will work together - and that someone _owns_ making them work together well - is the best way to ensure a successful data transformation, data mesh or otherwise. And you have to stay on top of things, it's not a one-and-done kind of transformation.Intentionality around communication is also crucial to successful data transformation for Tina. Overcommunication is a virtue. Data isn't about the 1s and 0s, it's about sharing information. So you need to be explicit in setting expectations and creating mechanisms for people to exchange information, especially across domains. Have regularly scheduled workshops to actually get people exchanging crucial context - it's like a good relationship, you need to continue to work on your communication. Many think about that information exchange at the actual 1s and 0s level but we need people to exchange information with each other constantly too. Otherwise there are too many incorrect implicit assumptions and again, balls get dropped and value is needlessly lost. Tina made a somewhat comical but very true point: when you are doing a large-scale change to how you do data work, if there isn't anyone saying they are confused, it's probably a bad sign. Because large-scale change is difficult and inherently change will be at least a bit confusing for most so if no one is speaking up, they probably have some bad implicit assumptions you need to address but you don't know what they are. At least with confusion, you can drill into where they don't get it. Lean into confusion because it creates the perfect situation to actually exchange context and drive people to the same page.While it is incredibly difficult to provide an exact value of data and data work - it will be valued differently by different people - Tina still asks people what is the purpose of doing the work. Why do we care about this data? That will tell us the general value of it if not a specific dollar figure.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 4, 2023 • 16min

Weekly Episode Summaries and Programming Notes – Week of June 4, 2023

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 2, 2023 • 59min

#227 Panel: Creating a Data Mesh Platform (1st Iteration) - Led by Paolo Platter w/ Manisha Jain, Jean-Georges Perrin (JGP), and Max Schultze

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Paolo's LinkedIn: https://www.linkedin.com/in/paoloplatter/Paolo's Medium (multiple data mesh articles): https://medium.com/@p-platterAgile Lab's website: https://www.agilelab.it/Manisha's LinkedIn: https://www.linkedin.com/in/evermanisha/'A streamlined developer experience in Data Mesh' blog post by Manisha: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-platform'A streamlined developer experience in Data Mesh (Pt. two)' blog post by Manisha: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-product'Data Mesh Accelerate Workshop' blog post by Thoughtworks: https://martinfowler.com/articles/data-mesh-accelerate-workshop.htmlMax's LinkedIn: https://www.linkedin.com/in/max-schultze/Max's Data Mesh Learning meetup presentation: https://www.youtube.com/watch?v=QwtTdP2wKFo(he has many more on YouTube! https://www.youtube.com/results?search_query=max+schultze+data+mesh)Data Mesh in Practice ebook he co-authored (Starburst info gated): https://www.starburst.io/info/data-mesh-in-practice-ebook/JGP's LinkedIn: https://www.linkedin.com/in/jgperrin/JGP's 'Data Mesh for All Ages' book: https://jgp.ai/2023/01/20/data-mesh-for-all-ages/JGP's website (lots of data mesh content): https://jgp.ai/JGP's Blog Post 'The next generation of Data Platforms is the Data Mesh': https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522In this episode, guest host Paolo Platter, CTO and Co-Founder of Agile Lab (guest of episode #3) facilitated a discussion with Manisha Jain, Data Engineer at Thoughtworks (guest of episode #220), Jean George Perrin (AKA JGP), Intelligence Platform Lead at PayPal (guest of episode #130), and Max Schultze, Associate Director of Data Engineering at HelloFresh (guest of episode #21). As per usual, all guests were only reflecting their own views.Scott note: I wanted to share my takeaways rather than trying to reflect the nuance of the panelists' views individually.Scott's Top Takeaways:As with every aspect of data mesh, you will need to focus your data platform work on what best drives value - doing so incrementally and both in the short and long-term. Look to start at common friction points in your org's data work, not just data mesh work, e.g. automated provisioning. That will give users a good first experience that is incrementally better to existing data work, potentially driving buy-in.It's _really_ crucial to understand user personas and necessary capabilities for each persona. Who is using it, why are they using it, and what are their points of friction? That experience plane will become incredibly important as you move along but it probably shouldn't be your initial focus.The word platform, especially in data mesh, can be thought of as a plural. Don't focus on only creating a 'single platform to rule them all', focus on delivering scalable capabilities that reduce friction and support scalability and reliability/trustability of data work by producers. Users don't care how it fits, create a system that can evolve and scale!You must treat your platform as a product itself - think about how well it satisfies user needs, how good is the user experience, etc. But that also means your platform has a lifecycle where you add and eventually prune features. From the start, apply product thinking to your platform :)Prioritization of platform capabilities will probably always be a challenge once you have more than a few domains using the platform. Balance being reactive and proactive with the need to generalize solutions to fit many use cases. Again, look to product management for advice on how to manage your roadmap and priorities.REALLY think about how to handle breaking changes. They are likely inevitable and that's okay but you need to focus strongly on communication and limiting the impact / providing a gentle migration path. Don't break without really needing to and do so sparingly.Data integration, especially in a highly regulated industry, is going to be a challenge you will face relatively early on. It's likely not going to be easy, be prepared for that.You need to understand where there is friction in the data product creation/management process - that's where your platform should focus. That might feel obvious but it's not how data platforms have been built - basically, look to automate unnecessary friction first, even if that means focusing on things like templates and blueprints instead of cool tech.Other Important Takeaways (many touch on similar points from different aspects):There are many ways to potentially get started - probably too many to list. But think about getting to early necessary capabilities that deliver value. It's easy to get bogged down in technical aspects - instead start specifically by asking what creates value quickly.If you aren't ready to build your platform incrementally, you probably aren't ready to do data mesh. You need to be comfortable with demonstrating value and building as you learn and as your needs progress.When you are early in your journey, discoverability/usability is a characteristic many might overlook - you are building data products to support a use case. But if you want to drive buy-in and get incremental users of those data products from additional domains, discoverability might be a very important early capability. What is your adoption leverage point?There's an interesting balance between MVP for the platform and easily enabling MVPs for data products. It's hard to say exactly where the line falls. Scott note: Glovo said in episode #139 that they wish they had focused a bit more on making it easier to launch and initially manage data products over other more advanced capabilities.Consider when to start saying you actually have a platform - that can be a bit of a political statement. Potentially wait until you have started to build out the data experience plane for data producers. If there isn't a tangible way to interact, the platform may exist but users don't really know what it is and what it enables them to do. But the experience plane shouldn't be among the first capabilities you build either. :)Your data platform work needs to focus on capabilities and enabling value delivery. But that value delivery needs to be visible to business users. Basically, find your value leverage points that are visible to users and focus on satisfying those when you can to drive buy-in that you're delivering value. It can be a political game unfortunately. You can't treat domains as if they are the same or, especially early, often even similar. There will be different value drivers/needs and often very different capabilities so you have to make sure your platform has the necessary capabilities to drive value for that domain. And if it doesn't yet or won't in the near future, it might not be time to partner with that domain yet - and that's okay!At the start of your journey, you will probably be focused on building capabilities to serve specific needs - of course, try to build those in generalized ways to serve many needs but still. As you evolve and grow the platform, you need to focus more on your overall suite of capabilities and look for where you have gaps. Don't only wait for capabilities requests - but also don't just build cool things because they are fun to build :) Start to build out data product specifications based on early data products. Those will provide a much easier path for later data products, whether built by new domains or existing domains using the platform.Charging models and who should own what costs are going to play a bigger role than you'd probably like. It's going to be a challenge you need to address at the platform level even while (presumably) others decide who should actually own the costs.Do not focus too much on building capabilities to _launch_ a data product. The launch is just a single day in the life of a product. You need to build capabilities that also help domains maintain and evolve their data products.Should the bounded context of a data product drive all the way down to the resourcing? Basically should there be zero shared resources between data products. I know this one is a bit controversial as it's a key aspect of Zhamak's vision but people worry about costs. If you believe in that complete separation, how do we make sure there aren't overlaps on computational resources and storage? It won't be super easy because it's not how most data services have been built historically.Many data engineers will still be thinking waterfall. Be prepared. Agile can work very well if it's done well - but it's not done all that well in data all that often… SAFe seems to be less hated in doing Agile for data - it's pretty widely panned in software - but it _might_ provide a bridge to doing Agile in data. Something to investigate but definitely not advice/guidance.Paolo essentially said (heavily paraphrased/edited for flow) 'We really need to pay attention, to optimize the ratio between the value that we bring with new features against the satisfaction of various users. This drives prioritization across phases, whether the focus of that phase is on adoption, productivity, satisfaction, etc. What is the outcome we want to achieve in each phase? Then we use that to prioritize features.'Every mesh platform build out will be different, especially depending on what capabilities are most valued in your organization. For some, that might be security/privacy governance re regulators. For others, it might be usability. Find you value leverage points and also find what the people with the purse strings will find the most valuable.Measuring the return on investment, even just the success, of a data mesh platform will be tough but it's important to start early and _start_ to get your arms around it. Productivity gains are a pretty easy measurement to consider as you get more sophisticated. Look to measurements of return on investment in software to get some ideas but be comfortable that things will always be a bit squishy.If you are building your platform as a product, products focus far more on user experience (UX) than most aspects of data infrastructure have historically. Make sure you really understand the value drivers - if users don't adopt your platform, nothing else you do with it really matters :)Producers being willing to own their data - and then understanding what that ownership really means - will probably be one of your biggest challenges. Hard to tackle that at the platform level but look to support those efforts as best as you can.Data producers and consumers still don't generally talk to each other in many organizations. Look to potentially put more mechanisms in to foster communication. Scott note: see my mesh musing - episode #188 - where I believe all consumers MUST register their use case with producers.You will have to decide things like do you isolate compute and storage resources at the domain or data product level. Data producers don't care, make that invisible to them. They key aspect to focus on is to enable and not block - basically, give domains the agency to get their work done but find ways to reduce their friction and obfuscate decisions like that from them. They are important decisions but irrelevant to users.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 29, 2023 • 1h 20min

#226 Learnings From Implementing Data Mesh at a Large Healthcare Company - Interview w/ Mike Alvarez

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Mike's LinkedIn: https://www.linkedin.com/in/2mikealvarez/In this episode, Scott interviewed Mike Alvarez, Former VP of Digital Services leading the data mesh implementation at a large healthcare distribution company. He's now working on his own startup.Some key takeaways/thoughts from Mike's point of view:Lean in to the new value-creating possibilities that can come from empowering thousands of your colleagues to leverage data.As an industry, we have to learn to do data work in an incremental fashion. It's not been the norm and it can break people's perception of data work but it's crucial to get where we want to go.You can drive data mesh buy-in from domains by showing them the freedom they will have. Autonomy, empowerment, going at their own speed, etc. can get many to lean in.Advice to past data mesh self: Early in your journey, you can share your vision until the cows come home and people will say they understand - and probably think they understand - but it's incredibly easy to get misaligned. Really focus on what you are trying to achieve. What are the target outcomes?Similarly, it will be harder than you expect to drive buy-in. Many people say that but it's still going to probably be harder than you expect after hearing that :)We need to move away from old approaches to data for large companies because the sheer scale of initiatives ends up creating bloat and risk factors unto themselves. Small and nimble gives us much quicker time to value delivery and builds to much greater outcomes.Shadow IT develops to try to move at the speed of business for domains. But it's rarely scalable or robust enough to even support the domain in the long-run and it certainly isn't built to integrate well with the rest of the organization. Try not to hold past shadow IT decisions against domains.Most teams - especially pre data mesh - don't truly understand the data they are ingesting. It's on consumers to get more information but if the producers aren't helping them, teams will ingest what they can even if they don't fully understand it. Data they don't understand well drives value but could be driving so much more value.Start from the problem first: what am I trying to solve? Do I need a new approach or can I use something I already have? Don't reinvent the wheel but we might just have to reinvent doing data at scale a la data mesh.Collect stories of past attempts internally with negative outcomes. What were the common reasons, the common patterns for things failing or not delivering expected value? They are useful for perspective and to drive buy-in. Treating data as a product makes more and more sense the deeper you dig into it. But just doing data as a product can't survive on its own as an approach to doing data.When trying to share information about data mesh, it's not like everyone will instantaneously understand or be on board. It will likely take a while in most organizations to build up the momentum to even consider starting on a data mesh journey. Have patience.Data mesh really enables teams closest to the customer, closest to the day-to-day business, to drive more value through data. It allows them to react much more quickly as the world evolves and focus on the problems of the customer.?Controversial?: The operating model change with data mesh is what drives the real value. And lots of domains can get bought in that they get to own their own destiny but be empowered to manage their data like a product instead in non-scalable and quickly deteriorating ways. Scott note: I think we do need better tech to fully leverage the potential value of data mesh but right now, I agree that most of the value is driven by operating model changes.A shared vision of what you are trying to achieve is important. It lets people rally around something and start to build a community, which is crucial to delivering on a data mesh approach.?Controversial?: Don't try to force your domains, your lines of business to leverage your centralized tooling and comply with optional governance (there is non-optional governance of course). In exchange for those who leverage central tooling, pay them back via automating away toil where possible. Community is about give-get. Be a good member of the community.The three crucial dimensions of a product: viability, feasibility, and desirability. When adopting product thinking, you should think about does your product satisfy all three factors.You need to communicate when something isn't feasible. Too often in data, people have just said no instead of 'no, and here is why…' Let people in on your thinking and prioritization process around what work to do when.Good product management skills are necessary to understand data as a product - to transition us from creating and sharing data sets to sharing high value information exchanged via a data product. You need to delve into and understand the domain to figure out what would be most useful to share via a data product.!Controversial!: It might be time to completely let go of the concept of "single source of truth." We've been chasing it in data for so long but the cost/benefit is starting to look it doesn't make sense. What are we trying to achieve - perfect data or a strong understanding of the world and how it's changing? Scott note: strongly agree and so does Zhamak.New, more correct information about aspects of the business is not always welcome. Unfortunately, you might have pushback if you attempt to tackle a problem that changes people's view of their business. So choose use cases, especially early, well :)Mike started off with the general need for large companies to change their approach to analytics at scale. We've been doing a lot of the same things for the last 30 years and they aren't quick enough to respond to changing business needs - 6+ months and $1M+ to get to your first query just doesn't make sense anymore - did it ever? And we can do better now. The business side of companies shouldn't have to wait for data and see the world change well before a solution is delivered. We need to move at the speed of business.Regarding shadow IT, despite leading a central data/IT organization, Mike doesn't hold it against domains. The lines of business can't deal with the bottlenecks of going through a central team and try to build things themselves. However, it's rarely all that scalable and certainly isn't built with sharing to the rest of the organization in mind. Shadow IT just isn't built with a product mindset so it becomes brittle and dilapidated quickly. So the central team is a bottleneck but the decentralized approaches don't scale. Add in the teams generally not really truly understanding the data they are ingesting or even often producing and it's a recipe for data underperforming expectations. Of course, this is what Zhamak identified and why she created data mesh.Mike talked about when considering a new approach to data, he didn't want to do data mesh for the sake of it. What was the problem they wanted to solve? Could an existing approach or platform do what was necessary? What were the organization's past failure modes or times when things didn't meet expectations and what were the common through-lines or patterns? And then he took those past unmet expectations and used them for understanding as well as driving buy-in. The definition of insanity is trying the same thing over and over and expecting different results. So if data projects were constantly not meeting expectations, shouldn't we change the way we approach data? And treating data as a product seemed like a great start, which led to selecting data mesh :)While data mesh can feel like the right call to some immediately, it's not likely to be the universal reaction at any organization. Mike and team spent a number of months driving to how this could work and building up the buy-in and momentum to even start on their data mesh journey. This isn't an overnight approach, you really need to think deeply about how it could work and - back to those potential failure modes - how it could go wrong so you can prevent heading down bad paths as best as possible. But what really drove Mike's interest in data mesh as a possible solution was how it could enable the teams closest to the customer to react to customer and market needs, especially changes in customer demands/wants/challenges. It is about empowering the teams to move at the necessary pace to stay ahead of the competition instead of waiting for a centralized team to give them access to leverage their own data or the data of teams close to them in the organization.For Mike, the value of data mesh isn't about the technology shifts, at least not yet. It's about the operating model shift - giving domains the capabilities and empowerment to handle data. We are trusting them to own their data and giving them the ability to do so in a scalable way. We are giving them the ability to react in a much quicker and more meaningful way. All of these can get people leaning in to doing data mesh. But they don't care whether it's data mesh or any other paradigm. That's where data people need to connect the dots for them, how can this work and what benefit does it have for the domain. And what are the actual changes for them?To get the most out of data mesh, Mike believes you have to have a strong vision of what are you actually trying to achieve. It's not an approach to take on lightly. You need to really think about aligning everyone around that shared vision and build as a community effort. How do you take the principles and new approaches and focus on delivering business value - for their own domain and the broader organization too?Mike believes a big part of doing data mesh is kind of the social contract around enablement and empowerment. Sure, teams can go off in their own direction but if they give up some of their autonomy to stick to the centrally provided tooling - which makes governance far easier -, you need to give them something in return. In their case, Mike and team gave the gift of automating away a lot of the toil work :D On advice to his past data mesh self, Mike talked about early in a mesh journey, people believe they are aligned on vision but they probably aren't. Holding all of data mesh as a concept and then contextualizing it to your specific organization is a massive amount of work and cognitive load. Trying to get someone to fully understand that upfront without seeing the progress, you will almost certainly have some misalignment and misunderstandings. Instead of the specifics, focus on the target outcomes, what are you trying to achieve? If people align on the benefits, you are more likely to gain and retain momentum. And it will take a lot of effort to get most people committed to the vision, just be prepared for that.Mike talked about the three key aspects of a product: viability, feasibility, and desirability. Feasibility it a crucial aspect to consider in data - especially data as a product thinking - because often, something just isn't likely to work for a number of reasons. And when there is the desirability but not feasibility, you really need to communicate why it's not going to happen. With data mesh, there can be a misconception that the switch has been flipped and we can do any data work we can think of - and that we should! But that prioritization process and understanding - and then communicating - what is the current art of the possible is important. Always be communicating about what you are doing when and why.Domain understanding is crucial to really understanding data as a product for that domain in Mike's view. How do we move from trying to serve data sets as if that is the product to creating the information that will be most useful to consumers about the domain in a productized way? And then iterating towards more and more value as you improve the data product or suite of data products representing the domain. Easier said than done of course. Mike asked the provocative question of do we still want to seek the fabled "single source of truth." It can be a bit like the dog chasing its tail - when you catch the tail, then what? Are we trying to perfectly clean data or are we trying to drive value from data? Is the juice worth the squeeze or can we drive better value - and especially nimbleness - by taking a slightly different view? Scott note: Zhamak urges people to consider "the most relevant source of truth" because there are multiple perspectives on the same things that can all have value, you have to decide what is best.Mike warned that some use cases are politically untenable or even toxic. Especially early in your journey, consider will participants actually want to know the information. Yes, in the abstract, we want everyone to be perfectly data driven but humans aren't and won't ever be. Don't ignore that and tackle something that will be more hassle than it's worth.In wrapping up, Mike had two points. The first is learn to work incrementally. That has been somewhat of the antithesis to how data work has historically been done but it's incredibly important. The second point is to really lean into empowerment and the art of the possible. We don't really know what might happen when we empower thousands of our colleagues to be better able to leverage data. Be excited and open to the journey of finding out what value they create.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 28, 2023 • 35min

Weekly Episode Summaries and Programming Notes – Week of May 28, 2023

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 26, 2023 • 22min

#225 Zhamak's Corner 23 - Driving to Resilient Data Value Now and in the Future

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Key Takeaways:Postel's Law: Be conservative in what you do, be liberal in what you accept from others. We can do better in data than what we did learning decentralization in services: "We have to level up. We can't repeat the past mistakes. Let's not be silly and fool ourselves just because we have a schema, now we have an amazing system."The services world has learned good ways of communicating between producers and consumers. We should look to learn more from them and look to adapt then adopt what works well. Zhamak believes we have to learn to prepare our data for future use cases. Scott note: If she means reuse of data being generated for current use cases, most agree. If she means creating data that doesn't currently serve a use case, almost everyone else seems to disagree. Time will tell.More on Postel's Law: https://ardalis.com/postels-law-robustness-principle/ Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 22, 2023 • 1h 7min

#224 Building Out Scalable Automated Access for Data Mesh at Disney Streaming - Interview w/ Himateja Madala

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Himateja's LinkedIn: https://www.linkedin.com/in/himatejam/Himateja's AWS ReInvent presentation (link starts at her part): https://youtu.be/y1p0BGsPxvw?t=1991In this episode, Scott interviewed Himateja Mandala, Senior Data Engineering Manager and Head of the Data Mesh Data Platform at Disney Streaming. To be clear, she was only representing her own views on the episode.Some key takeaways/thoughts from Himateja's point of view:?Controversial?: Your existing data platform(s) might not be able to serve data mesh well, even with reasonable augmentation - especially if your data platform has become hard to change. You might have to build from scratch.When the data platform's key users aren't part of the centralized team, you need to think about enabling automated capabilities by default, e.g. security the second data lands or easy to leverage and understand monitoring/observability.?Controversial?: Data products serving different use cases often end up looking relatively different. Is your data product for dashboards and reporting/analytics; is it for serving a recommendation engine or machine learning model; or is it more for internal usage? Be okay with data products not being uniform.Even if your data mesh platform operates outside the traditional paradigms, many data producers - especially data engineers - will still be thinking data pipelines. Be prepared for that, it's an ingrained way of thinking for many.Data contracts are very helpful in defining and maintaining quality. If you set up good observability on your data products, owners can quickly identify when there are quality challenges.When building out your platform, user conversations are crucial. Go and focus on pain points. The coolest capabilities in the world won't lead to good adoption if you aren't addressing real pain/needs.Automation and blueprints are key to scalability in a data mesh data platform. Teams need to be empowered to easily do their work.Don't only focus on creating the tooling to process and share data in the abstract, dig into how teams will share information with each other, how they will communicate. That isn't only exchanging data via data products.Even if you have domains inside your organization that want to share data/information with each other, it is hard to get to a place where consumers can actually trust the data without a lot of explicit enablement of trust at scale. Enabling trust at scale is a key role of the platform.Enabling teams to go at the speed of their business through owning their own infrastructure really drives good buy-in. It might take slightly longer to get something spun up the infrastructure for the first data product but they quickly learn and will often strongly prefer the visibility and control they now have.When requests for new capabilities come to the data platform team, you need to consider how to generalize the capability to be applicable to more use cases if possible. And sometimes the right answer is the platform can't support that one-off need.Centralize your governance capabilities in the platform but federate the decision making. There should be standard approaches to access control to make it easy for people.!Controversial!: Disney Streaming has very strong RBAC (role-based access control) policies to make it very easy to delineate who should have access to what but if you have a certain clearance level for one domain, you have that clearance level for all domains. Scott note: this is about 35min into the interview, it's a really interesting approach. I couldn't see it working for highly regulated industries but it's working very well for them.?Controversial?: Preventing data from leaving the mesh under any circumstances is an effective risk control - if someone somehow gets access to something they probably shouldn't have access to, the blast radius is quite contained.If you have any data sharing agreements with partners/vendors, make sure to keep their access heavily contained. Create specific spaces - or cloud accounts - with strong rules to prevent them getting improper access to any other parts of your data.Himateja started the conversation with the situation Disney Streaming was in that matched many organizations right now: many data platforms but not one that will really fit with data mesh, even with augmentation. So she and her team decided that because the existing platforms were too hard to change to meet the needs of a data mesh implementation, they'd need to build their data mesh platform from the ground up.When you have new key personas leveraging the data platform, even if those are data engineers embedded into the domains, Himateja recommends rethinking how data work is done. What do people need automated and by default, like security? How do you create monitoring/observability that helps people easily pinpoint issues as they come up? How do you make data accessible by default at the data product and greater mesh level? Etc. In a decentralized, federated data approach, ways of working and needs will be different so dig into what are the actual pain points instead of solving the same pain points of previous implementations.Himateja shared that while people may think data products are pretty similar, they end up relatively different based on use case. Audience also really mattered when trying to figure out what capabilities people required early in the journey - execs were often more focused on data privacy and security and data scientists were focused on data quality. It's hard to focus on the business context at the platform level because many people are used to doing that via request. The data products themselves need to own business context.Data contracts are crucial to maintaining data quality in Himateja's view. While they are certainly helpful to data consumers, they are also very helpful to data producers because - with proper observability - data product owners can quickly identify and address quality issues as they emerge instead of waiting until consumers complain and downstream data is wrong. That proactive alerting and then response helps everyone better trust the data. However, data contracts are still a work in progress because not everything is easy to define in a contract, there are definitely gray areas that are improving but not great yet. Scott note: and that's okay, we can't get everything perfect upfront, we have to iterate towards better :)Himateja then shared a lot about what the data platform team that she leads set out to do at the start of their data mesh journey. One aspect was to create a center of excellence approach, standardizing how data engineering work is done to create data products across the 15+ teams running on the platform now. They did that by starting to drill into pain points and doing lots of listening to potential users. They needed to take a different approach rather than just yet another data platform. Preventing the central data platform team from becoming a central data engineering team was a worry for Himateja: how do you prevent being a bottleneck and empower teams to do what they need to do? Especially at the start of a journey? As many guests have pointed to, automation and blueprints have been crucial. Teams pushed back initially at the thought of managing their own infrastructure but they realized it gave them the ability to move at their own pace - no more waiting in a prioritization queue for necessary infra. Another key milestone was developing tools to make it easy for cross domain communication and data sharing. Domains at Disney Streaming actually wanted to share their data with each other and the data mesh platform/implementation made that possible - it was previously very difficult to trust data but now that quality metrics were clearly defined and tracked, data sharing and usage between domains increased significantly.Prior to doing data mesh, Himateja shared that data engineers in domains had no real visibility into data infrastructure - provisioning timeline or any other aspect. They'd push a ticket and wait for things to happen. But with data mesh, since they own the infrastructure, they can go at their own pace and can understand much more about any delays. So they understand and better control their own timelines, which makes them far happier. And then once they’ve gone through the process of spinning up infrastructure the first time, the next time they can be that much faster. They are better able to move at the speed of their business.Himateja shared about the process of evaluating new platform capabilities requests. As many past guests have noted, you need to establish a process to abstract away the requirements from individual use cases to find a generalized approach. Otherwise, you end up with yet another overburdened platform that you can't evolve. A specific example at Disney Streaming was enabling their Apache Kafka clusters to better communicate across domains which were leveraging individual cloud accounts. Instead of building a solution to share data for each technology they use in the platform, they built a system to better enable sharing across accounts with proper access control and privacy. And sometimes, the answer is that you can't support a unique requirement via the platform - that's okay and often is the right call. At Disney Streaming, Himateja and team implemented a very interesting approach to access control via RBAC (role-based access control). There are a few levels of data usage clearance but if you are at access to PII level or access to financial information level in one domain, it's the same clearance level for all domains. This might not work for heavily regulated industries but it's working very well for them. The work to decide who has access to what data is done ahead of time instead of constant requests. They just think very carefully about each use case and who should have access and why but there isn't a need to manually grant access. And there is of course oversight to see how people are using data to make potential changes. Scott note: this is a really interesting approach and I'd love to hear people's feedback.In wrapping up, Himateja shared how they are strongly limiting their blast radius around sharing data with partners/vendors. They have accounts that are not able to get access to any other accounts where they give those partners/vendors access so there is not a way for them to access data they shouldn't be able to see. It's a simple security pattern but others should consider adopting it in her view.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 21, 2023 • 17min

Weekly Episode Summaries and Programming Notes – Week of May 21, 2023

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner