Data Mesh Radio cover image

Data Mesh Radio

Latest episodes

undefined
Oct 4, 2022 • 15min

#137 Even More on Driving Buy-in in Data Mesh - Mostly About Domains - Mesh Musings 31

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 3, 2022 • 1h 17min

#136 Building Your Data Platform for Change and Reusability via Modularity - Interview w/ Alireza Sohofi

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Squirrel (OSS data platform) GitHub: https://github.com/merantix-momentum/squirrel-coreAlireza's LinkedIn: https://www.linkedin.com/in/alireza-sohofi/In this episode, Scott interviewed Alireza Sohofi, a Data Scientist focused on building the data platform at Merantix Momentum.Some key takeaways/thoughts from Alireza's point of view - some written directly by Alireza himself:Where possible, look to build your platform in a loosely coupled way. It will make it easier to extend and evolve; and domains can replace pieces, mix and match components, or even extend the functionalities when it makes sense.It's easy to fall into the trap of building a platform that is hard to evolve and support. Be very conscious about what you want to include - and not include - in your platform. Don't try to solve every challenge with a point solution.To effectively share data - and the information it represents - software engineers / domains need to really understand their own data, including data modeling. That can't be easily outsourced. A platform team's job is to build the tooling so those domains only need to deal with the data, not the data engineering.If you want a scalable platform - in many senses of the word scalable -, your platform should be relatively generic. It must also be easy to extend and augment. Focus on providing flexibility and ease of customization. One size definitely won't fit all.Packages and templates are both useful but templates are typically more user friendly and easier to customize - start with templates when possible.If there is a need for customizing or extending a package or template, it's better to first build it within a domain (with the help of the platform team if necessary). The generalized version of the new feature is then contributed to the platform. This leads to a more integrated domain-platform, more robust first release of new features in the platform, knowledge sharing, and avoiding bottlenecks that may arise if only relying on the central platform team.Platform teams need to A) dog food the platform - you will learn far more by using it; B) provide good methods of communication for domains to give feedback and requests; and C) find better ways to exchange context with your domains regularly, e.g. pair work and scheduled informal chats.The platform consists of several tools that should not only work well together, but should also work well with a wider ecosystem of open source tools. Solutions that try to offer end-to-end coverage usually fall short when it comes to flexibility and changing requirement and business environment. Composable components that can work together is the way to go.Tools should be opinionated, i.e. encode the best practices, but at the same time hackable to the very core. Layered design where domain teams can choose the abstraction level which is appropriate for them is a good choice.Alireza started by sharing a bit about how Merantix works with clients - often, their clients are not that deep into machine learning and want to outsource that. So Alireza and team are essentially building a data platform that is use-case agnostic across many different data maturity levels and modalities, both on the production and consumption sides, that is scalable and cost-efficient. Sound like a familiar challenge? While their platform is specifically for machine learning, it's a good approach to dig into, partly because they recently open sourced their platform so others can dig deep into the implementation aspects. The most difficult challenge of the platform and working with customers, per Alireza, is the data ingestion. Clients are using a vast array of source systems and formats so they had to focus on each customer's specific challenge. Unfortunately, this means there is a custom-built driver for each use case/dataset for ingesting customer data but they have created a number of templates for ingestion and transformation to make the custom development - whether initial development or incorporating changes - relatively lightweight; the customization - and subsequent coupling - is typically only related to the business logic related to the use case. So, while it's not ideal, it's a scalable approach that is serving them well.While customers are outsourcing their machine learning, that doesn't mean they are not data literate according to Alireza. To be able to leverage the Squirrel platform / service and get good information out for their applications, those domain teams still need to really understand their own data. Unfortunately, Alireza does not have the silver bullet to training generalist software developers to really handle their data - they must be able to model the data properly themselves. The platform team's job is to make tools so they can deal with the data but not the data engineering - but a centralized team handling the data modeling can quickly become a bottleneck.When talking about scalability - both pure throughput scale but also scalable across many use cases, Alireza believes you must build a platform that is generic enough that it isn't tied to use cases. It must be relatively easy to extend or augment the platform as well. You need to provide people the flexibility and ease of customization if they want to own the complexity themselves. Easier said than done but still important to repeat. Blanca and Pablo at Plain Concepts said similar things in their episode.According to Alireza, it's important to think about templates versus packages. Use templates where possible for more simple things because packages, by default, have some choices embedded in them. Templates are starting points that more often point directly to the choices people can make with defaults versus people discovering the choices made in packages. But both can be useful. And when a template or package needs to be extended, the work should be done by the domain team - otherwise you have centralized work that can become a bottleneck. But do have the domains contribute those extensions back to the platform as well.Because Squirrel is open source, Alireza and team had to really think about how to make things loosely coupled, even within the platform. So the drivers, computation, storage, etc. are all able to be extended or even swappable. This means each domain team can replace things if they truly have reason to and they can still get good leverage from what is already built. Alireza had some direct advice: when developing your platform, be very conscious about what you want to include. You don't want to do one-offs. What can you reliably abstract because it's a repetitive need? The first time you see a new pattern, don't rush to build supporting that in the platform. Otherwise, it's very easy to build a platform that is hard to evolve and support. Analyze a diverse range of challenges to find your patterns and then abstract.It's crucial for platform teams to develop good communication and ways of working with your domain teams - or whoever your users are - and really learn how they use the platform according to Alireza. You should provide good ways for them to provide feedback and make requests but also more informal ways of teaming up and exchanging context, e.g. pair programming and scheduled informal chats. That way, every communication isn't an ask - think water cooler chat. Your platform team should be users of the platform too - the best information and feedback often comes from being a user yourself.Alireza wrapped up on a major challenge that is still yet to be well addressed: how can we embed the semantics into the data. We need to figure out how to "solve ontologies that don't align across domains".Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 2, 2022 • 24min

Weekly Episode Summaries and Programming Notes – Week of October 2, 2022

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 30, 2022 • 1h 11min

#135 Iterating Consciously - and Quietly - Towards Data Mesh Capabilities - Interview w/ Balvinder Khurana

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Balvinder's LinkedIn: https://www.linkedin.com/in/balvinder-khurana/In this episode, Scott interviewed Balvinder Khurana, Principal Data Architect at Thoughtworks.Some key takeaways/thoughts from Balvinder's point of view:Data mesh is NOT a silver bullet and not everyone is ready to do data mesh - others have stated that but it's crucial to repeat.A data mesh doesn't happen in a vacuum - you need to assess if you are really ready and does it align first to your business strategy and second to your data strategy.If you decide to move forward on a data mesh implementation, really consider how you will measure progress and success against business goals.To evaluate data mesh appropriately, consider what business value having better data practices would bring to your company and is your company aligned into lines of business or would you need to reorganize your business. Are you prepared to extend your line of business practices to data?A common failure pattern in analytics has been not looking at the Intelligence Cycle - changing your operational systems and processes as a result of insights. Don't just generate insights, insights must generate action! Data mesh must avoid this too.Even if existing centralized data team setups have significant bottlenecks, data consumers typically eventually get their needed data. Those data consumers can see something like data mesh as a risk - will they still be able to eventually get the data they need? Is eventually getting to faster access to new data worth the perceived risk?If you have resistance to data mesh, look at delivering necessary capabilities to your data producers and/or consumers in a small-scale, incremental way and not specifically tying that in their mind to data mesh. Tie those incremental capabilities to business value.Look to constantly communicate the what and the why of improvements to your platform to drive engagement. What is the purpose? Why should they care?It's easy to fall into the trap of trying to iterate everything constantly. There is a concept of "good enough for now". Don't be focused on getting everything perfect, that juice is not worth the squeeze.A good signal to reevaluate your domain boundaries is: a dashboard is still used but the owner no longer uses it. They don't want to own it any more so you need to find a new owner. That probably means changing business boundaries if the original user doesn't use it anymore.To drive buy-in for data transformation, especially something like data mesh, you should look to drive it from both top-down and bottom-up if possible. It can work with only one but support from both makes it much, much easier.It's okay to start out with lots of manual processes, just be conscious about where you want to automate. Take on tech debt consciously.It's easy to get caught up in the trap of trying to optimize for time-to-market on every use case. But you need to balance time-to-market against quality and scope. You don't get to pick all three and might not even get to pick two. Look at what's really crucial.Balvinder started by discussing the big question: is data mesh right for an organization - right in general and especially right for them right now? You need to start from questioning what are your key strategies - business strategies first before data strategies - and where are you right now. Data mesh isn't your data strategy and implementing data mesh doesn't happen in a vacuum - the real world is messy and ever-changing. So you need to line up your data strategy with your business strategy and then you get to the fun of "okay, how do we measure our success against our business goals?"Once you have your business goals and how you will measure against them in place, Balvinder believes it's time to look at how can data help you reach your specific business goals and/or measure your success against your business goals. Again, have those business goals in place first. Then you start to look at your data operating model to see if it will help you achieve those goals and measurement. That's when you start to ask if data mesh is right for you.When evaluating if data mesh is right for you, Balvinder recommends a multi-pronged approach. First is again to look at the value cases, what does your organization want to achieve with data? The second is to ask what challenges are you facing right now? Will data actually help you address those challenges? What are you trying to do with data and how would a more mature data practice help you achieve your goals? And the third is to look at how you are structured, do you have boundaries already set up around domains/lines of business? If these are aligned, then you can start to consider data mesh.Not paying attention to the Intelligence Cycle is one way Balvinder believes analytical approaches have failed in the past - we need to keep an eye on preventing that failure mode in data mesh. The Intelligence Cycle is about taking information, analyzing it, and then pushing the results of that analysis - either directly or indirectly - back into your operational systems and processes. Essentially, the so what of doing data analysis - is there an actual impact to it? Actionable insights are only valuable if you actually take the action after all.Balvinder shared the story of an existing client with a centralized data team and an existing data platform: a request for new data would go to the central data team; the data team would reach out to a number of potential data producing teams to try to figure out how to service the request. It took 2-3 months to get data from request to delivery. But, it was a known quantity even if it was slow. They knew who to reach out to. So, they were hesitant to rethink their approach because they can eventually get the data they need. Will it be the same in data mesh? That's a risk many might not be willing to take.In order to implement data mesh in that kind of environment, to alleviate that perceived risk, you should focus on business continuity per Balvinder. By moving more slowly and only making small incremental changes to the existing platform and ways of working, they were able to move more and more towards data mesh without having to get everyone bought in ahead of time. Adding capabilities and explaining the business value of each capability - instead of trying to sell the data mesh concept as a whole - meant they could continually deliver business value. A thin value slice for each incremental capability with the selling point not being the tech. They created smaller milestones that led to a better path for their client.And it paid off according to Balvinder - the client team felt empowered to do things for themselves with data. That team wanted to launch a new KPI and they already had all they needed to do it with the slow feeding method of adding capabilities and resources through a better platform. So they didn't need to go to the central team and quickly launched their new KPI.Constant communication was crucial for Balvinder and team to drive engagement and buy-in. And it's important to not communicate just once as things will inevitably evolve as the real world changes. And it means you can deliver continuous improvements instead of trying for a big bang approach. Your domains will continuously change, your platform will need to evolve, etc. There is a concept of too much iteration and change as well - it's okay to assess if something is "good enough for now". You don't need things to be perfect, look to change where there is the most benefit.For Balvinder, it's really easy to get overly focused on "I want this in production yesterday." But, when a data consumer says that, then they must understand they need to compromise on quality and scope. It's not necessarily even you get to pick two, there are aspects of all three and if you focus on time-to-market, it will mean it's not the quality they want and the scope needs to be limited. So what really matters most to the business? And if the answer is always time-to-market, Scott believes you might need to talk to your data consumers about what really matters. Balvinder wrapped up on a theme many guests have touched on: data mesh is not some silver bullet. It won't solve all your challenges. You need to really think about if it's right for you. And there are many companies that aren't ready for data mesh. And that's okay too!A few quick tidbits:A good signal to reevaluate your domain boundaries is: a dashboard is still used but the owner no longer uses it. They signal they don't want to own it any more so it needs a new owner. That probably means changing boundaries because the business need that drove the creation of the dashboard is no longer with the team that created that dashboard.To drive buy-in for data transformation, especially something like data mesh, you should look to drive it from both top-down and bottom-up if possible. It can work with only one but support from both makes it much, much easier. Having management support that the data initiatives/strategy align to the business strategy is important but so is buy-in from the people actually doing the work :)It's okay to start out with lots of manual processes, just be conscious about where you want to automate. Take on tech debt consciously. When it's time to improve that capability in the platform, invest in it. But don't try to build everything ahead of time.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 28, 2022 • 16min

#134 Zhamak's Corner 5 - From Pipeline to Product, a Crucial Evolution in Data and Data Tooling

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 26, 2022 • 1h 18min

#133 Nitty Gritty From the Deployment Committee: Crucial Learnings on Driving Buy-in and Data Product Discovery - Interview w/ Ammara Gafoor

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.ammara.gafoor@thoughtworks.comAmmara's LinkedIn: https://www.linkedin.com/in/ammara-gafoor/Data Mesh in practice article series from Ammara and colleagues: #1: https://www.thoughtworks.com/en-us/insights/articles/data-mesh-in-practice-getting-off-to-the-right-start#2: https://www.thoughtworks.com/en-us/insights/articles/data-mesh-in-practice-organizational-operating-model#3: https://www.thoughtworks.com/en-us/insights/articles/data-mesh-in-practice-product-thinking-and-development#4: https://www.thoughtworks.com/en-us/insights/articles/data-mesh-in-practice-technology-and-the-architectureIn this episode, Scott interviewed Ammara Gafoor, Principal Business Analyst at Thoughtworks who has been working on a few client projects related to data mesh including one for well over a year.Before jumping in, it's important to note that much of Ammara's learnings come from an implementation in a 100K+ employee company split into 21 high-level domains. So the definition of domain in this episode revolves around that context of a very large business unit, not a two pizza team size sub domain.Some key takeaways/thoughts from Ammara's point of view:There often is a hangup around data work, especially relative to data mesh, where people want to get it all right, all perfect the first time. That's never going to work. Get something decent out there, test, and iterate. Perfect is the enemy of done. No bikeshedding!If you don't look to change domain's KPIs to align their operational work to data mesh "you won't prioritize it - you cannot prioritize it." Make it easy for domains to prioritize data mesh work if you want it to get done.?Controversial?: source oriented data products should not be made available to business users within the domain or to almost anyone in other domains - at least by default - as they are difficult to understand for anyone other than the highly data literate people in the domain."Don't make things that you don't need yet." Build data products for use cases you've identified. Think of the target business outcome and work backwards to create the necessary data products.Far too many people try to build customer 360 solutions. And they very rarely work. No one ends up happy.Sometimes it seems like data mesh literature says get rid of many existing data roles - but the people in those roles are subject matter experts in how the organization and/or the domains do data. You want to leverage them as you evolve their roles and they can pick up new skills related to data mesh.When driving buy-in, be prepared to repeat yourself multiple times. And then repeat yourself multiple times more... Learn to speak the language of the domain as well to drive buy-in with the business people. Look for the 2 most obvious use cases inside a domain. Supporting those use cases, you will need 5-6 foundational source-oriented data products that will support many - most? - of the use cases for that domain. It will appear like you need many more source oriented data products but zooming out, you will see you don't need nearly that many.Typically, using the median, use cases are powered or supplied by two consumer oriented data products and each consumer oriented data product is powered by three to four source oriented data products. They planned for a single approach to work with all domains to start implementing and contributing to the data mesh. That proved to not work well and every domain needed it's own process to really make it all work. There are commonalities of course but there isn't a cookie cutter approach unfortunately.It's very easy for people within the same domain to talk past each other, especially in a functionally siloed org structure. Data mesh can be a forcing function to find a shared, common language to drive to better results.From Team Topologies, there are 3 streams: operating model, product stream, and technology stream. They should all be given equal weight. No, you can't just focus on the tech stream...There may be push-back from both IT and the business side to having the business people collaborate closely in a data mesh implementation. But it's very important to overcome that - that collaboration with the business people is crucial to making data mesh work.Business leaders may have - or at least believe they have - what they already need currently via shadow IT. It may be difficult to convince them to change their ways of working "for the greater good".When looking at driving buy-in with domains, look at how they are incentivized. Often, short-term results are more rewarded than focusing on mid- to long-term results. If next year's funding is based on this year's results, they will focus on this year's results...Typical personas in a data mesh implementation: the IT sponsor, the business owner, the "no stance/sideline", the "says yes to your face...", the product owners, and the data lake builders (aka those feeling the target on their backs).Ammara started off the conversation sharing about how she and her team "had it all laid out" for the plan to standardize how they'd bring each domain up to speed on data mesh - from the introduction of new ways of working to being ready to participate in the data mesh implementation in just six weeks. And then reality struck. Each domain is different and much like trying to explain the benefits or implementation of data mesh, a single approach for all audiences just didn't work well so they adapted. Every domain is unique and required its own unique approach to make implementing data mesh in that domain work. There are of course some commonalities but each of the 13-14 domains that are part of the data mesh implementation thus far has had its own unique challenges.So, Ammara shared some stories about working with different stakeholders. Often, the first stakeholder they encountered was an IT sponsor for the domain itself - either an IT leader in the domain or an IT counterpart for the domain. This persona typically wanted to bring them in and welcomed them with open arms. And while they were often bought in on data mesh, there was a push - from IT and often the business side - to only speak with IT. So Ammara and team had to work to get permission to also include the business people in the conversations about their proposed data transformation. Because without the business support and knowledge, your data mesh implementation is likely to fail. How many episodes have said tie your data strategy to your business strategy? But, the business people often have what they need currently via shadow IT. So why would they want to give that up? It's an emotional response to be asked to give up what you have for the greater good and the long-term.There is the concept of immediate returns - you build a dashboard and there is immediate potential value - versus the mid- to longer-term returns from things like building your data platform and building out your data governance capabilities. Ammara has seen many times there is not any incentive to wait and focus on the mid- to long-term returns - if your funding this year is based on results this year, focusing on your results 2-3 years out is often doesn't feel like an option. They won't get rewarded for that long-term work. And most domains don't even have the capabilities to do said mid- to long-term high-value work. But to do data mesh right, we need to incentivize patience - and incentivize and provide the capabilities to do things right for the long-haul instead of just the short-term, low stakes wins. According to Ammara, as part of a successful data mesh implementation, there is the technical stream - the Team Topologies meaning of work stream - but you must also work on the operational stream at the same time. And the product stream too. If you don't look to change domain's KPIs to align their operational work to data mesh "you won't prioritize it - you cannot prioritize it." You need to put a metric into place to measure progress - it doesn't even have to be a great measure! It's a way to start the conversation. There is too much of a hangup in data mesh around trying to get things perfect the first time. Get it done, measure it, iterate on it, and move forward. Don't let perfect be the enemy of done and/or good. Don't fall to bikeshedding. The cost of change and the cost of failure in data historically have been very high per Ammara. But we have new economic models with cloud that make that no longer true. We now have "the privilege to be able to fail". Failure wasn't an option historically. But that's such a foreign concept to many, it will cause some to push-back. They have lacked the psychological safety to fail. And we have to understand why they are pushing back and work with them to understand that failure in a highly agile environment is incremental learning. After picking the 2 most obvious use cases in a domain - again, the very large business unit concept of a domain -, Ammara believes it will reveal a 5-6 of the foundational source-aligned or "source oriented" data products of the domain that will be able to power most use cases. So just start building the MVP of those source-aligned data products because they will support other use cases down the road as well. On Personas, Ammara laid out a few she and team have run into:The IT sponsor - typically a Data Architect or Data/Analytics Lead; bought in to data mesh, likely after feeling the pain points as Zhamak has laid out. Trying their best to go wide on getting people bought in on data mesh and has some - but not a ton of - social capital to influence. Their social capital is more with the IT/data people and less on the business side of the domain. They are critical to get things moving.The Business Owner - generally supportive of the data mesh initiative but doesn't have the time - or the incentive - to spend time on the data mesh implementation. You're trying to get their support by the promise of making their lives easier.The Sideline Watcher - sees data mesh as probably 'yet another data trend'. Not pushing back but not taking a stance. Waiting for the tide to turn one way or another before making their own waves.The "Yes to Your Face" - will say yes to you and then just go do whatever they were going to do anyway… These are inevitable - try not to take it personally.The Product Owners - they are building the dashboards or the analytical solutions, desperate for the data. They really WANT to work with you but don't know exactly how - how can they get the resourcing and we're asking them to rethink the way they do their work. Help them figure out how they can partner where possible. The data lake (or other historical data paradigm) builders - have spent so much time and effort to build a viable data lake/warehouse/etc. Often fight you because you're going against everything they've built. It's not personal against the data mesh team but it is personal if you put all their hard work aside. But they can build data initiatives very well, try to work with them and let them know you're building off the knowledge they've gained if not their direct work.For Ammara, a lot of the data mesh literature and conversations feel like they say there are new roles and therefore there isn't room for many existing data roles, like the data warehouse or data lake builders/maintainers. But she thinks that's not a great idea - and Scott agrees. They are subject matter experts in how the domain's data flows and systems actually work and can be excellent guides to bringing more people into the data fold as they themselves pick up new skills. Trying to hire your way to a data mesh is not a great idea… No one is redundant, everyone has valuable knowledge for Ammara.You need to make your IT sponsor successful in order for your data mesh implementation to go broad in that domain so that means learning the - and communicating in the - language of the business according to Ammara. That might mean you have to deal with the horror of PowerPoint Presentations. And as many guests have said, the selling points and implementation details of data mesh don't stick with the broader audience the first time. Repetition, reframing, holding of hands, etc. You won't succeed if you try to just message once. Be prepared to repeat yourself. And then repeat yourself again.Ammara gave an example of why data mesh can really help improve communication and drive to common language. In manufacturing, there is the concept of "on time, in full delivery" as a very crucial KPI. And the domain had analytics teams constantly asking to build this for the different manufacturing lines while at the same time, the business side said they didn't have the information. How could that be when there were 10+ completed "on time, in full delivery" projects that had been funded? So once Ammara and team removed the data team from the picture, the business folks were able to talk with the regular IT team and they came to a shared, common understanding of what was actually needed and what was missing. It's pretty easy to lose sight of what the actual need and use case is when people are siloed by function.It is crucial to understand the three streams of work model, per Ammara. The operating stream is "building the cadence for IT and business to communicate" in order to prioritize. This helps identify which data products will be built. The product stream is identifying the actual data products that need to be built, as in what are the scope and boundaries. The technical stream is about building the data product and the platform needs. Each of the three streams should have equal weighting. This is another way to think about your MVP thin slice, you must encapsulate some of each capability, each stream.As previous guests have noted, many domains build data products that benefit themselves first in Ammara's experience. This obviously makes it easier because there is more buy-in and no cross-domain communication and prioritization friction. But that is just the initial stages of a data mesh implementation - still in phase 1 before going truly broad. More domains are moving to support use cases across domains so phase 2 might be up soon. Ammara does not believe source oriented data products, ones that are difficult to understand outside the domain, should not be made freely available on the mesh; they should not be made available to business users within the domain or to other domains. And her reasoning is very sound: if the data products are difficult to understand, it's easy to misuse them and they are more likely to change with the source systems so breaking changes/versions are more common. Other domains can consume the information from those source oriented data products in specially designed consumer oriented data products instead of directly from source oriented data products. Data scientists are a bit of another story as they are data literate enough to do some spelunking but even then, data scientist beware. Ammara is also seeing an interesting pattern relative to source oriented data products. When you really start to map out a lot of obvious use cases for a domain - and remember, the size of a domain in this context is quite large -, it might seem like you need a large number of source oriented data products. But when you zoom out further, it becomes clear that you can actually shrink those into a much smaller number, that 5-6 data products mentioned earlier for that domain.The way things are evolving at Ammara's current client is 3 layers relative to data products and use cases. For each use case, there are one or more - typically two it sounds like - consumer oriented data products. Then each consumer oriented data product is derived from or powered by typically three to four source data products. So the domains are able to create multiple consumer oriented data products off the same set of 5-6 data products. But it's still early days and will likely evolve further :)Encourage people to think business need first instead of data first according to Ammara. Think about what business outcome you are trying to achieve and then work backwards to what data you need to address that. If we are just sharing information without intention, it can lead to misuse of data - will people really understand it? Will it be compliant use? Sarita Bakst had really good thoughts on this in her episode. Don't make things that you don't need yet.Data Mesh Radio is...
undefined
Sep 25, 2022 • 28min

Weekly Episode Summaries and Programming Notes – Week of September 25, 2022

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 21, 2022 • 17min

#132 Zhamak's Corner 4 - The World Is Changing, Are You Accounting for That?

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 20, 2022 • 17min

#131 More On Driving Buy-In in Data Mesh - Mesh Musings 30

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Sep 19, 2022 • 1h 23min

#130 Making the Data Quantum Leap - Starting from the Data Quantum at PayPal - Interview w/ Jean-Georges Perrin (JGP)

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Data Mesh at PayPal blog post: https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522JGP's All Things Open talk (free virtual registration): https://2022.allthingsopen.org/sessions/building-a-data-mesh-with-open-source-technologies/JGP's LinkedIn: https://www.linkedin.com/in/jgperrin/JGP's Twitter: @jgperrin / https://twitter.com/jgperrinJGP's YouTube: https://www.youtube.com/c/JeanGeorgesPerrinJGP's Website: https://jgp.ai/In this episode, Scott interviewed Jean-Georges Perrin AKA JGP, Intelligence Platform Lead at PayPal. JGP is probably the first guest to lean into using "data quantum" instead of "data product". JGP did want to emphasize that as of now, he was only discussing the implementation for his team the GCSC IA (Global Credit Risk, Seller Risk, Collections Intelligence Automation) within PayPal.Some key takeaways/thoughts from JGP's point of view:Data mesh as it's been laid out by Zhamak obviously leaves a lot of room for innovation. For some, that's great. For others, they want the blueprint. And it's okay to wait for the blueprint. But JGP and team are excited to innovate!PayPal's 3 main initial target outcomes from data mesh: A) faster and easier data discovery, B) easier to use the data in a governed way, and C) increase data consumer trust in data.PayPal's initial data consumers are data scientists so their platform and data quanta are built to serve that audience first.Really consider what you want to prove out in your MVP. Is that minimum viable A) data quantum, B) data platform, C) data mesh, or D) something else? Only doing a data quantum probably sets you up for trouble and a platform only won't be tested until it has data quanta on it.Data contracts are crucial to making trustability actually measurable and agreed upon. Otherwise, it's far too easy to have miscommunication between data producers and consumers, which leads to lack/loss of trust.Producers, don't set your data contract terms too strictly when first launching a data quantum. There's no need to over-engineer - despite how interesting that can sometimes be...For too long, we have tried to keep software engineering and data engineering overly separate. They are both just engineering with slightly different focus and data mesh really leans in to that.We've also tried to keep operational and analytical far too separate. We should look to build out tooling where data can live that serves both operational and analytical workload needs. But we aren't there yet.Analytical APIs, at least as far as we've seen them to date, are just not going to do what we need relative to accessing data from data products/quanta according to JGP.Standardizing metadata access APIs across data quanta has made it very simple for data consumers to begin using new data quanta as they are introduced to the mesh. PayPal has observability, discovery, and control APIs.Domain is an overloaded word. It can mean a very large high-level domain like Sales, Finance, or Marketing with hundreds or thousands of people in it or it can mean a smaller, 'two pizza team' level scale.PayPal is doing only one data quantum per domain but that domain is really at the two pizza team scale - they aren't trying to have a single data quantum for Marketing.It's crucial to understand data quanta and the use cases they power both have life-cycles. So really applying product thinking is crucial.Most data engineering teams do work in a waterfall approach and that just doesn't scale well. However, moving to data mesh can mean additional cognitive load as it really requires an Agile mindset to do right and that shift in the ways of working is not trivial.It's good to have smaller delivery requirements so you get faster feedback on what you are creating - a core tenet of Agile. Don't try to deliver everything all at once. Get it in users' hands early to get feedback.JGP started the conversation talking about how in his team, he's really leaning into the idea that software engineering and data engineering are not that different. Zhamak has discussed this too. We should focus on sharing practices so we all create better software and infrastructure. For JGP, data engineering work in most organizations has followed a very waterfall approach. However, his team has been mostly working in an Agile manner. Therefore it wasn't a huge switch to their ways of working - like it is at many organizations - once they started doing data mesh. And luckily, there was already an appetite for changing the way they were tackling data challenges.In the spirit of being agile and capital A Agile as well, PayPal set out on their data mesh journey. They wanted to do an MVP but what was the P? Minimum Viable Data Product/Quantum? Minimum Viable Platform? Both? Minimum Viable Mesh? JGP recommends looking at what you want to deliver as a minimum unit of value. PayPal already had extensive data platform expertise so they were able to focus on delivering data products/quanta (plural of data quantum) but they worked in parallel to build out their initial data quantum and mesh capabilities. As many guests have noted, it's dangerous to only do a minimum viable data product/quanta.PayPal has been building data platforms for a long time. As mentioned by JGP, they were one of the pioneers of the self-service data platform concept. But data mesh offered a path to faster and easier data discovery, to making it easier to use data in a governed way, and to increased trust in data by the data consumers - their first consumers being data scientists. A big benefit of addressing those needs is those data scientists are able to better tell if the data they access is the right data for their use case.One thing JGP emphasized that's significantly helping PayPal move forward is standardizing APIs across data quanta. Those are not data access - or analytical - APIs as JGP thinks those will just never work all that well. Instead, as their audience is data scientists only to start, everything anyone needs other than the actual 1s and 0s of the data is accessible via Python APIs. The metadata, the observability/trust data, etc. Then, the data scientists use notebooks to work with the data. But standard APIs means data consumers only have to learn one interface. This is similar in concept to what many are doing with data marketplaces - one standardized way to interact with the information about the data quanta.PayPal is using the terms data product and data quantum as two separate things. A data product is simply a product powered by data and analytics. Those have been around for quite some time. But PayPal is looking at data quanta like side cars, used specifically to power more and more of their data products going forward.PayPal have invested heavily in making data contracts work well per JGP and earlier PayPal guest Jay Sen. They've been building APIs to make it far easier to consume data contracts as people learn more about a data quantum. And as mentioned before, they can consume observability metrics via API as well. When asked about how are they setting their actual contractual terms, the data producers initially put out some contractual terms and then may adjust those terms as data consumers request. It's important for data producers to not set their data contract obligations too strictly unless there is a user-based need.JGP made the good and often unspoken point: the term domain has lost a lot of its meaning. It can mean a very high-level domain like Marketing, Finance, Sales, or HR. Even in software companies, a domain could be Product. But at PayPal, they are being quite strict about what they mean for domain in data mesh: it is a small scale sub-domain - think two pizza team size - and they enforcing a strict 1:1 relationship of one data quantum per domain; and of course, not cross domain source data quanta too. That way, each small domain can focus on creating a great data quantum instead of worrying too much about how big each data quantum should be. The scope should never get that huge at a two pizza team size.Back to APIs, PayPal is implementing an API-first approach. APIs for the data quantum control plane, observability APIs, and data discovery APIs. It's the preferred way of working for their initial consumers - data scientists. However, as mentioned previously, JGP does not believe analytical APIs - that is APIs designed to do things like filtering and returning many hundreds to thousands or more results - are really feasible. Definitely not now and possibly ever. So APIs are great for getting at the metadata but not the data for analytical use in his view.JGP wrapped up in sharing how our tooling must evolve so we don't have to think about such a hard wall between analytical and operational. There will always be analytical and operational workloads but our systems can evolve to support both. We aren't there yet though.Quick tidbit:If you are just delivering data, the 1s and 0s, you are not delivering the necessary trust.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app