Data Mesh Radio cover image

Data Mesh Radio

Latest episodes

undefined
May 4, 2022 • 9min

#70 For Your Sanity, Stop Trying to Solve it with Technology - Mesh Musings 13

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
May 3, 2022 • 1h 13min

#69 Getting Data Sharing Right at Netflix Scale - Interview w/ Justin Cunningham

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereJustin's LinkedIn: https://www.linkedin.com/in/justincinmd/In this episode, Scott interviewed Justin Cunningham, who worked as a tech lead and data architect on data platforms at Netflix, Yelp, and Atlassian over the last 8.5 years. In that time, Justin was involved in initiatives to push data ownership to developers / domains.To sum up one of Justin's points he touched on repeatedly - he recommends to create a pool of low effort data which will inherently have low quality. Use that for initial research into what might be useful. Focus on maximizing accessibility - you can have governance and use things like join restrictions or give consumers an ability to self-certify that they are using the data responsibly. Once you get the use cases, then you go for the data mesh quality data products. Justin saw a lot of success at Yelp focusing on data availability - getting data to a place it could be found and played with - was a bigger driver for success than focusing initially on data quality. Once people discovered what data was available and how they might use it, the organization was able to work towards getting that data to an acceptable quality level.Another point Justin made was figure out which you want to optimize for in general: getting things right upfront or testing and changing. He believes in optimizing for change. Create an adaptive process and optimize for learning. Keep it simple and focus on value delivery - it will set up more tractable bets.At Yelp, they were trying to ETL a huge amount of data in their data warehouse to build reports for the C-Suite. But they were never really going to get enough data ingested to really meet their goals. It was taking them 2 weeks to create each new set of ETLs and that was just creation, not maintenance - it was looking like they'd need 5x the number of people. What Justin found the most useful at Yelp was to focus on getting as much "usable" data in an automated way. They achieved this initially through the data mesh anti-pattern of copying direct from the underlying operational data stores and building business logic on top of it. But, that data getting into the hands of the data team meant there could be an initial value assessment - once they proved out there could be value in the data, the conversation with developers was much easier to get them to care about providing clean and reliable data.Justin mentioned the same thing Wannes Rosiers mentioned in his episode: there are operational and analytical workloads but there should absolutely not be that separation when it comes to data. Data from operational systems is useful for analytics and vice versa. One thing that really helped developers understand how to share data was thinking of data sets as being similar to public APIs.At Netflix, there were just too many bespoke data sets - made it very hard to manage quality. What they found that worked was a data certification program for data sets, creating tooling to prove a data set was complete and accurate. That and upping the amount of focus on data set reuse significantly helped them to combat the data sprawl. Back to data accessibility and availability versus quality, for Justin, he believes data analysts and data scientists initially care far more about getting data access as you can work to improve the data quality later - especially if there is a clear owner. I discussed this in a mesh musings about speculative data products but a key hack for them was being able to mark data as low quality. On driving buy-in for data producing teams, Justin again talked to proving there was value in data before the producers were bought in. Asking them to serve their data upfront without a clear, specific use case was very tough. The return on investment or ROI was very squishy. So they got out low quality data initially and then came back to producing teams to up quality and reliability once they proved certain data was valuable. This is someone similar to the emerging data mesh pattern of creating your data products for a consumer-focused use case. It might be a source-aligned data product but it should still be initially serving a specific purpose with a targeted outcome. It can grow from there.Justin also shared his thoughts on how the way we do data lineage is broken - we should look to do lineage declaratively instead of just as a reference. This should flow through both the schema registry and the data catalog. What is the data movement supposed to be? This would enable us to much more easily test data flows and alert downstream users of upcoming changes.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
May 2, 2022 • 1h 16min

#68 The Build vs. Buy Dilemma of Data Platforms - Interview w/ Doron Porat

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereDoron's LinkedIn: https://www.linkedin.com/in/porat-doron/Our journey towards an open data platform: https://medium.com/yotpoengineering/our-journey-towards-an-open-data-platform-8cfac98ef9f5A simplified, lightweight ETL Framework based on Apache Spark by Yotpo: https://github.com/YotpoLtd/metorikkuThe Data Swamp (in Hebrew): https://open.spotify.com/show/5YDdtRhp1RVw7r5fbYFtPQ?si=5x4HzOyhTX6n46HqY5kV6w&nd=1In this episode, Scott interviewed Doron Porat, a Data Infrastructure Leader at the SaaS company Yotpo.Some crucial points Doron made:1) Be kind to yourself when you make mistakes - it's worse to stagnate so don't be afraid of change and making choices2) Build versus buy is always tough but don't let your ego get in the way and push you towards building everything3) If you do buy, build a close relationship with your vendors to help influence the roadmap and have an outlet if you are having issues4) A data platform team's job is to drive usage as usage means creating value - drive towards that and set your KPIs around platform usage5) There will likely be many different types of consumers of your data platform - work to improve / optimize the user experience for most folksDoron is a technologist at heart so for each decision she instinctually wants to build instead of buy. And at the start of building out the data platform for Yotpo, that was typically her decision. But as the demands for more and more capabilities from the platform, the increasing ubiquity and quality/scalability of as-a-service offerings, and the growing need to drive usage and developer happiness instead of manage cool tech, she started to consume more and more managed services.When you are building out the platform, vendors, in the long run, can often better serve your needs because they have a whole lot of people focused specifically on making what you use better. You need to make bets on the vendors getting to where you need them to be and sometimes they don't pay off either. To up the chance of making those bets pay off, Doron recommends building relationships with your vendors to influence their roadmaps and get help when necessary. Doron strongly recommends putting together a framework for evaluating build/buy decisions. Some of the factors she considers are how extensible is the offering, is this taking on too many challenges in one solution, cost, the cost of later migration to or from a managed service, open source compatibility, etc. One thing Doron talked about that many teams seem to struggle with is the ego hit of saying someone else managing a service we use will drive more value. That's always tough but needs to be addressed.Doron talked about the strong need to drive your platform forward, not just be responsive. Provide a roadmap, set time aside for innovation, etc. Doron made the point there is a difference between learning to leverage a tool and operate it. You want to make sure you build out the knowledge around leveraging a tool whether you are operating it yourself or not. It's an interesting balance when you build that you don't focus too much on one or the other.Doron talked about the general job of a data platform team is to drive usage because that usage means creating value. Serving critical needs for users is crucial to driving adoption and focus the user experience on the business logic - no matter how cool the tech is, a data platform team's job isn't to expose that to users. And your team's KPIs should reflect usage. However, there are all kinds of users - on both the consumer and producer side - that you need to serve so don't create a one-size-fits-all experience. If your data platform isn't easy to adopt, no one will want to use it. Your data platform should make things fun and casual to do. There needs to be a big focus on enabling data testing including staging environments for data - that will drive usage.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
May 1, 2022 • 25min

Weekly Episode Summaries and Programming Notes - Week of May 1, 2022 - Data Mesh Radio

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Apr 29, 2022 • 1h 12min

#67 All About Interoperability and Standards in Data Mesh - Interview w/ Samia Rahman

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Samia's LinkedIn: https://www.linkedin.com/in/samia-rahman-b7b65216/FHIR standard cheat sheet: https://www.healthit.gov/topic/standards-technology/standards/fhir-fact-sheetsIn this episode, Scott interviewed Samia Rahman, Director of Data and AI Strategy and Architecture at life sciences company Seagen. Samia is helping to lead Seagen's early data mesh implementation after helping with two implementations at Thoughtworks since the start of 2019.For Samia, interoperability is about taking information from two systems and combining them to get a higher value. A simple definition but a good one.Two potential key takeaways: 1) don't try to plan too much ahead for developing interoperability standards but definitely keep an eye out for places where you could start to develop those standards. And your standards really, really should evolve - you don't have to nail them right out of the gate. 2) your interoperability will also evolve - you don't need to make every data product interoperable with every other data product and you can start with basic interoperability first. The more you can standardize around unique identifiers, the better, but it's okay to not get it right first thing out of the gate.Samia started her career - and even before in school - focusing on software, especially end-to-end development. A repeating pattern for her has been how crucial contract testing is to getting things into a trustable and scalable state. We've had them in hardware and software for a long time and if you don't have easy testing, those systems often get replaced pretty quickly. Those tests are the safety net to allow for fast and reliable evolution. And that evolution is a key theme for this conversation - set yourself up to iterate and evolve as you learn. Work to not paint yourself in a corner Data standards, including specifically for interoperability, are everywhere in the life sciences space - FHIR, FDA has lots, etc. but it's still not great for truly sharing the meaning of the data. FAIR is trying to get there but the interoperability and domain knowledge isn't really standardized yet.Samia strongly recommends not getting ahead of yourself on interoperability and standards. It's perfectly okay to start small - iterate and build on your standards for interoperability, To start have some key identifying "linkers" done. Get things out in front of consumers so they can explore and give feedback and use that to power your iterations. Incrementally building towards a standard is crucial.If you are going to build a standard, reusability should be your first goal. If it is only for a single use case, that isn't a standard, it's just an implementation detail. Samia again recommends contract testing / a schema checker. And definitely leverage existing standards. It's also not a huge deal if you have more than one standard internally. You don't need one standard to rule them all.Per Samia, if you implement versioning, data consumers are usually very willing to work with data producers as they evolve data products. But without versioning, you are just pulling the rug out from underneath them. And right now, there isn't a lot of good info on versioning data out there, nor tooling. The need to evolve data products is why absolute self-service is probably never possible. The human-in-the-middle is important to help consumers evolve their thinking as the business model evolves.Samia mentioned the data consumer responsibility to inform data producers - inform them about need changes, issues with their data products, etc. We can't have data consumers going off and all creating their own fixes to data quality issues, the data producers need to know so they can fix them at the source.You need to be on the lookout for interoperability opportunities and then validate that there is a need for for interoperability. An important point is that not all data needs to be interoperable. Samia finished with her interoperability vendor wish list - some kind of tooling that can more easily detect when someone should use an existing standard and that can put those standards in front of data product producers much more easily. How can we make it very easy for data product producers to build in interoperability and leverage existing standards from the start?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Apr 27, 2022 • 11min

#66 Negotiation as Your Avenue to Success in Data Mesh - Mesh Musings 12

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Scott shares his views on the importance of collaboration via negotiation, not requests, to make your data mesh implementation a success.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
15 snips
Apr 26, 2022 • 1h 1min

#65 What's a Data Contract Between Friends - Setting Expectations with Data Contracts - Interview w/ Abe Gong

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereAbe's Twitter: @AbeGong / https://twitter.com/AbeGongAbe's LinkedIn: https://www.linkedin.com/in/abe-gong-8a77034/Great Expectations Community Page: https://greatexpectations.io/communityIn this episode, Scott interviewed Abe Gong, the co-creator Great Expectations (an open source data quality / monitoring / observability tool) and co-founder/CEO of Superconductive. One caveat before jumping in is that Abe is passionate about the topic and has created tooling to help address it. So try to view Abe's discussion of Great Expectations as an approach rather than a commercial for the project/product.To start the conversation, Abe shared some of his background experience living the pain of unexpected upstream data changes causing data chaos / lots of work to recover from and adapt. Part of where we need to get to using something like data contracts is to remove the need to recover in addition to adapting and move towards controlled/expected adaptation. Abe believes that the best framing for data contracts is to think about them as a set of expectations.To define expectations here, this would include not just schema but also the content of data, such as value ranges/types/distributions/relationships across tables/etc. So for instance, a column may be a one to five for rankings and then the application team changes it one to 10. The schema may not be broken - it is still passing whole numbers - but the new range is not within expectations so the contract is broken.At current, Abe sees the best way to not break social expectations is via getting consumers and producers in a meeting to talk about the upcoming changes and prepare, such as with versioning. But, as tooling improves, Abe sees a world where we won't even need a lot of those meetings going forward - either because data pipelines can be "self-healing" and automatically adapt to changes upstream or because metadata and tools for context-sharing will reduce the need for meetings.Abe sees two distinct use cases in general for data contracts or more specifically how people are using Great Expectations to implement data contracts. The first is purely defensively - put some validation on the data you are ingesting to prevent data that doesn't match from blowing up your own work; the second type is when the consuming team shares their expectations with the producers and there is a more formal agreement - or contract - with a shared set of expectations. The first often leads to the second, via an agreement conversation that happens after there was an upstream breaking change. Abe also mentioned there is a third constituent on data contracts in the room: the data. Sometimes the consumers and producers may agree on what they expect, but if that’s different than what’s in the actual data, then it's hard or dangerous to move forward. The data has a veto. There was an interesting discussion on the push versus pull of data contracts - should the producer team create an all-encompassing contract or should we have consumer-driven contracts? Would producer-driven contracts be too restrictive, preventing the serendipity insights data mesh aims to produce? Would consumer-driven contracts mean multiple contracts for each data product that the producer agrees to? Is that sustainable?So, to sum it up, the idea of a set of explicit expectations around a data product that are the result of collaboration between producers and consumers sounds like where we should all head if possible. If the expectation set is only coming from the producer side, it might be overly restrictive and miss a lot of the nuance necessary to actually create consumer trust. And exclusively consumer-driven contracts don't sound sustainable or scalable.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Apr 25, 2022 • 1h 9min

#64 The Crucial Value of Data About Your Data: Approaching Data with a Product Mindset - Interview w/ Sadie Martin

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereSadie's LinkedIn: https://www.linkedin.com/in/sadie-martin-06404125/In this episode, Scott interviewed Sadie Martin, Senior Product Manager, Data Platform at Q4 Inc about applying a product mindset to data in general. This is really crucial to getting data as a product right but also in building out your data platforms and even some processes for data mesh.Scott's summation of some key points:Anyone can apply a product mindset, not just the product managerGiving yourself the time before starting work to investigate and create you measurement framework, including your baselines, is crucial to measuring data work progress and choosing where to focusApproach your data work with intentionalityReally understand what you are trying to accomplish and what your immediate customers/consumers are trying to use the data for to accomplish.Sadie started as a data analyst where the team didn't have a product manager - they were doing a lot of work and weren't sure if things were likely to work or even if what they did had a positive impact after it was done. So she started to take on some of the task of answering those questions and transitioned into being a product manager for data. So, what is a product mindset? For Sadie, the easy definition but with lots of hidden depth, is "it's all about really understanding the problem". For most organizations, really thinking about the problem you are trying to solve is new relative to data. There may be a data request but what product or process is that data contributing to and what is that product or process trying to solve? Sadie believes measuring the problem is really crucial. Once you figure out what you are trying to solve, what is the scope of the problem? How are you going to measure if you are actually solving the problem? Especially is it better than what you were previously doing? She also talked about the importance of customer-centricity - really why are they making a data ask? Should this really be a one-off or a repeatable process? Did they ask for the complete set of what they need? Etc.One crucial insight Sadie has brought from product management to data is to be willing and ready to throw things away. If it ain't working, don't be too precious. That's a very different mindset than we've historically had relative to data. There's also the idea that processes can devolve quickly so ensuring when you start a repeatable data process, understand the effort to keep it going. While it feels counter-intuitive, Sadie laments that for most, it's often quite difficult to get the buy-in that you need data to measure if your data work is actually providing value. It's still worthwhile to do however. You need to take the time to do spikes and investigate ahead of time and slow down enough to set yourself up to measure results. Just continuing to go off assumptions and gut feelings is going to put you in a vulnerable spot to a competitor doing the work.Sadie looks at measuring the success of data work in two ways. The first feels obvious once said but really isn't: start by measuring the baseline. Without that baseline, you can't measure if you're having an impact. And lots of data work proves to be low value or negative value - you tried a hypothesis and it isn't working so stop and move on. How do you get to that answer fast? You measure the incremental change for the effectiveness. So what happens when you do look at your work and find out it's not been valuable? Per Sadie, you have to get away from the sunk cost fallacy. It's absolutely okay to make bets and they don't pay off and you move on. You need to really investigate if you are solving the problems you set out to solve. And by proving out the value of the product mindset so you can make better bets in the future.A lot of the product mindset is also thinking about the return on investment, not just maximizing the return or value of data work. Can the simple get you where you want to go without doing the extra cool but complicated and/or risky parts? Sadie mentioned a few things getting in the way of applying the product mindset to data. One is that there are often teams making promises on behalf of the data team without checking with them first. The other is many data consuming teams view the data platform teams as simply service teams, not partners.While there has been a lot of hiring for data product managers in the last year or so, Sadie sees that often the companies aren't making the product mindset an actual priority and that feels like a waste of a good product manager.There is a misconception that data work is all about facts. A large part of it is discovery work, much more than in most disciplines. Per Sadie, measuring a team's effectiveness should focus more on getting to an answer than getting to preferred answers. Evaluating a lot of hypotheses and proving them invalid isn't a bad thing - you prevented a lot of toil work that wouldn't have added value. Make sure to measure teams based on that.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Apr 24, 2022 • 27min

Weekly Episode Summaries and Programming Notes - Week of Apr 24, 2022 - Data Mesh Radio

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Apr 22, 2022 • 1h 38min

#63 Driving Domain Maturity Through Empathy, Respect, and Understanding - Data Innovation Summit Takeover Interview w/ Henrik Göthberg

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.This episode is part of the Data Innovation Summit Takeover week of Data Mesh Radio.Data Innovation Summit website: https://datainnovationsummit.com/; use code DATAMESHR20G for 20% off ticketsFree Ticket Raffle for Data Innovation Summit (submissions must be by April 25 at 11:59pm PST): Google FormHenrik's LinkedIn: https://www.linkedin.com/in/henrikgothberg/Dairdux website: https://dairdux.com/Airplane Alliance website: https://airplanealliance.com/In the last of the interviews for the Data Innovation Summit Takeover week, Scott interviewed Henrik Göthberg, the Founder and CEO of consulting company Dairdux, the Co-Founder of the Airplane Alliance, and the Chairman of the Data Innovation Summit.Let's start with some conclusions/advice from Henrik:When working with other departments, in data mesh or not, you need to start from respect, empathy, and understanding for people in different roles.When you think about maturing a domain or process, a big bang approach very rarely works. You need to think about evolution, not revolution.To find a good pathway to maturity, start with the domains already on the leading edge, the innovators; trying to get the laggards to catch up instead of focusing on those who see value in maturity is going to lead to pain and likely not much progress.Start with less complicated and high risk challenges so you can learn and develop the right muscles to do things easier in the future. Focus heavily on reuse - reusable data, yes; but also templates and other "easy path" enabling things. To succeed in data mesh, you need to get to a place where you can have broad reusability. Reusable data, reusable processes, reusable templates, reusable tooling, etc.In a data mesh implementation, start with an initial domain but move on to adding a second domain quickly if possible. Templates will get you to value quickly.It's okay to skip automating or building out a great solution for certain pieces of your data mesh implementation. What will get you in trouble is building half-solutions that end up as major pain points. This is the biggest source of unintended tech debt.If you business people don't understand they own the processes and the data, your data mesh implementation is much more likely to fail.Background and other color:Henrik covered his journey from 2012 to present in most of the first 30 minutes - from joining a domain to add analytics capabilities to that domain to building out a large data and analytics central team at the same company to joining a new company in 2019 to help them implement a new data strategy which has evolved into implementing data mesh.Henrik joined Vattenfall to build out the data and analytics team inside the sales org. They had a multi-country domain with different maturity levels across each country. They needed to improve the data and analytics capabilities and operations in all three countries so they could have strong data and analytics capabilities at the country and European level. The team had some technical savvy but they were struggling with actually getting the data - the data was locked into the source systems. It was difficult to even do basic customer analysis and data science, not to mention anything fancy. So they needed a lot of help in maturity.In 2015, Henrik became the Business Intelligence Officer at Vattenfall. That meant taking ownership of the centralized team with lots of core data and analysis. A big part of the role was owning providing costs in very granular ways so needed to try to move to a very standardized reporting model for P&L. A big change was in consumer maturity. When Henrik first started the role, people were mostly consuming reports. They moved to consuming data sets and even raw data. As part of that, they often moved from ETL to ELT, which caused some major headaches as many have seen with the data lake.All of that background maturing the data and analytics capabilities helped Henrik when he joined Scania, a truck manufacturer, in their financial services division. The culture of the company was already very decentralized and modular, which can set up well for data mesh but that also meant domains were very independent with limited standards or standardization around data enterprise-wide. They had a big data lake implementation with a good raw data layer and a semantic layer but the analytics layer on that was lacking. The centralized data team was struggling to even manage the raw data layer from a governance perspective and they were feeling increasing strain from issues trying to manage data pipelines. Henrik mentioned the necessary evolution process for domains - a "big bang" approach very rarely works. And Henrik started with the domains in the innovator category as they were the most bought in on domain maturity. As part of this process, they were able to decommission many large data warehouses.To start, Henrik focused on what was valuable to build for the domain - the micro level - instead of valuable to the greater organization. That way, he could mature that domain much faster and if there are multiple mature domains, those mature domains are better prepared and capable to work with each other. There was a focus on building reuse wherever possible - not just reusable data but what templates and other easy path things could the team create.After year 1 of focusing on creating value from the data products individually, Henrik and Scania started to focus more on creating value at the overall mesh level - this is where data product interoperability really can come into play.Before you get going on a data mesh journey, Henrik recommends spending the time to really plan out how you think your implementation will work and how it will create value for the organization. And what will be the near-term value adders and what will be the longer-term value adders. Henrik strongly believes in either taking challenges on with the intention to get to a good solution now or not tackling the challenge at all. The half-assed solutions just lead to far more pain so either commit to take it on or leave it entirely for later. Another piece of advice is to not have the domain teams just hire without consulting the central team, especially if there is a central team around that competency. Look instead to embed people from the central team into your domains so they can understand the friction points to build out templates to address that friction. For Henrik, it's key to find the right people in each domain who can be a sensible buyer. There needs to be a high level of trust between the business and IT and so you need someone who can develop a strong relationship with IT. For Henrik, you need to start from respect, empathy, and understanding for people in different roles in order to actually form a strong relationship. Business people often think it's not that hard to set up your data and analytics processes well. You should focus on investing time and energy with the key players to develop a good relationship. That way, it is much easier to get to each other's context. Henrik wrapped up talking about to succeed in data mesh, you need to get to a place where you can have broad reusability. Reusable data, reusable processes, reusable templates, reusable tooling, etc. He also believes that domains, especially the business people inside the domains, need to understand they own the business processes AND the the data.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app