
Data Mesh Radio
Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh.
Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out!
Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing
Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above.
If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/
You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/
Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio
Latest episodes

Jun 26, 2022 • 28min
Weekly Episode Summaries and Programming Notes - Week of June 26, 2022 - Data Mesh Radio
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 24, 2022 • 45min
#92 Good Data Mesh Governance Through Empathy and Partnership - Interview w/ Jay Como and Elizabeth Calloway
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Liz's LinkedIn: https://www.linkedin.com/in/elizabeth-negrotti-calloway/Jay's LinkedIn: https://www.linkedin.com/in/jaycomoiii/In this episode, Scott interviewed Jay Como, Head of Finance Data, and Elizabeth (Liz) Calloway, Director of Finance Data Products at Silicon Valley Bank. To be clear, they were only representing their own views and experiences.Some key takeaways/thoughts from the conversation:The governance team should "wear them down with empathy." Take the time to share your context, learn their context, make them feel seen and heard. That will get them to see you as a partner and good governance is truly about partnering, not mandating or being a gate/hurdle to get past.Great governance is the pathway to great data. Great data leads to great decisions which lead to great outcomes. Share that path to great outcomes so people can see a clear answer to "why are we doing this?" Governance isn't just risk mitigation, it can be a significant - if almost always hidden/secret - value driver.To drive governance buy-in from data producers, again, lead with empathy. Let them in on the "why" - why does this matter? What is the business value? How can this benefit them?"Help me help you" is a good approach to talking to internal teams about data governance. You are there to drive value for them, take work off their plates when appropriate.You can further drive buy-in through helping teams get to quick wins. While the long-term is obviously important, incremental value-add is better than a big bang approach.Provide a constant stream of value, including executing on where you are helping, will drive teams to want to work with the central governance function. Drive value and the buy-in naturally comes with it.Good data governance is necessary to avoid fees, fines, and the huge revenue/business impact bad data can have. But don't use those as a boogeyman, don't use fear to sell good data governance.It's easy for a centralized governance team to become a bottleneck. Focus on not solving all the problems for teams but being there to help when they need it. You are the backstop, not the stop sign. Be the support, not the roadblock.To prevent governance in general from being a bottleneck, you must have flexibility and pragmatism. If exceptions to requirements are necessary, then those exceptions are often a valid response to other constraints/pressures. But be very explicit about the reason and type of exceptions and also very explicit about the expectations for if/when those exceptions will be remediated.Good governance is about incremental improvements, not trying to do everything as a big bang. Set expectations, move forward. Fix for today and prevent the same issue in the future.Jay started the discussion talking about the concept of governance-as-a-service - in other words, providing a service to internal stakeholders instead of a mandate of comply or else. And while governance may not be the most "sexy" aspect of data, it certainly is one of the most crucial. For Jay, without good governance, data is often not nearly as correct and clean as it could be so internal stakeholders aren't making as good of decisions as they could. He laid out a simple framework of great data leads to great decisions which lead to great outcomes. There obviously is a law of diminishing returns but find the line where the juice isn't worth the squeeze.As for driving buy-in internally, Jay recommends to "wear people down with empathy". And be ready to repeat yourself, maybe with small variations until you find what resonates, to be able to make change. You don't often win them over with a single conversation or presentation.Liz started off by pointing to the fact that it's quite easy to drive buy-in that good data governance provides value for data consumers. It's driving that buy-in upstream that becomes a lot more difficult...To get moving on that buy-in, Liz recommends flipping the script and changing the narrative and dialogue. It isn't driving a mandate, it's talking about what benefit this has to them and the organization. It's not governance for the sake of it! That way, they are more open to guiding or shepherding them towards providing good data. So in all your communications, lead with empathy.Jay discussed how they are taking on the really hard parts of governance from their business partners. That way, those business counterparts will see the governance team as a legitimate partner, not a dreaded gatekeeper. Provide them with a constant stream of valuable insight and work - decks, plans/roadmaps, etc. - and then execute on what you say you will. That will drive buy-in. Jay also mentioned that acting like you are a highly regulated entity even if you aren't will set you down a good path to understanding and protecting your data. It's an investment but a big risk mitigation and potential value driver.Liz talked about while governance is not a direct revenue stream - even if they add value to data, driving better outcomes - it is often also a major cost avoidance. Obviously for a highly regulated industry like banking, that can mean fees and fines. But many people are also pointing things like Unity Software's $110 million negative revenue impact from ingesting bad data, announced in mid May 2022.It's not about driving fear for Liz. And talking about how bad data has a negative impact on the business feels basic or a bit obvious for her. But having the conversations, showing how better governance drives better outcomes, executing on the tough parts the governance team takes on, all that will drive buy-in.Scott asked about how, if the governance team is taking work on from other teams, do they avoid becoming a bottleneck. Jay first off acknowledged it's a real possibility and one to specifically try to avoid. One thing that has worked for him historically is to play a neutral party and extract the context so both sides understand the other instead of battle each other. Make everyone feel seen and heard rather than solving all the problems for them. But fixing issues where possible, not so teams are reliant on you but see you as the partner, has worked well for him to date.Jay discussed a few ways to prevent general bottlenecks in the governance process. In data, we often take on tech debt unintentionally or sweep it under the rug. But Jay recommends being flexible with requirements BUT calling out very specifically when, why, and how you are waiving requirements in that specific situation. It's not being lax, it's being pragmatic that not everything has to be perfect as long as all parties are aware and there is a mitigation plan in place. Being explicit about exceptions and expectations means you can move much quicker. Be flexible but realistic.Speaking of being realistic, Liz talked about how there is no magic wand to suddenly fix governance challenges. And things are never going to be in a perfect state. Good governance is about incremental improvements, providing roadmaps and executing on the roadmaps as a partner - and make your business partners feel heard along the way. Things change and being flexible is crucial. How can you fix the problem of today and then put processes in place to prevent that problem in the future? Nothing is black and white.Jay quoted Jerry McGuire, "Help me help you!" The governance team is all about enabling other teams to drive value and prevent issues. So drive a perception of being the helpful team. It's also quite difficult to get "your" plan implemented - it's much easier to get "our" plan implemented.In the new school data governance, as opposed to the old school, Jay believes we need to drive people towards caring about data governance at all. It can be a value-add, not just a hassle. Both Liz and Jay emphasized you can prove data governance as a value-add by helping teams get to quick wins - show the value. You want to balance that out with some long-term wins, not only the short, but it's still another endorsement for driving to quick wins as Joe Reis emphasized in his episode.Jay and Liz wrapped up by talking about the need for a great team and to drive forward with empathy. You need to work as partners with your internal constituents. Take some work off their plate to make things easier for them so you can drive more value together.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 21, 2022 • 10min
#91 The Case of the Missing Data Mesh Zealots - A Mesh Mystery! - Mesh Musings 19
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 20, 2022 • 1h 15min
#90 Sharing Data Reliably in Hyperscale Mode - Interview w/ Björn Smedman
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Björn's LinkedIn: https://www.linkedin.com/in/bjornsmedman/In this episode, Scott interviewed Björn Smedman, Engineering Manager at Communication Platform-as-a-Service (CPaaS) company Sinch.Some interesting thoughts or takeaways:A good indicator for when decentralizing your data team might make sense is the cognitive load of a centralized data team. How many systems - including a measure of how complex - are they managing? How much of their time is spent in meetings, especially trying to understand context/requests? Is there starting to be combative prioritization from multiple domains? It can be very beneficial and scalable to apply data mesh principles to non analytical use cases, especially sharing data for application purposes. It is still often difficult to prioritize creating a data product for machine learning without knowing the business value of the ML model. But the ML team needs the data first before they can figure out the business value of the ML model. You have to make speculative bets.If you see the data platform team start to dig into the semantics of a use case, that's a red flag that people are trying to leverage them as a data team. And while you want a centralized data platform team, you probably don't want them to become a centralized data team.Since December 2020, Sinch raised nearly $2 billion USD. With this funding, they have made a number of sizeable acquisitions, with the company growing from 500 employees to over 3,000 in about a year. This has led to some interesting challenges in sharing data in a hyper-scaling environment. Per Björn, data is a very key part of Sinch's plans for growth. Sinch's operational systems are often very transactional, as some product lines can process tens of thousands of monetary transactions a second, so data that might be typically shared on the operational plane in other companies is shared on the data plane lest the operational data stores deal with billions of events, making the data challenges even more complex than for most organizations. Then add in the regulatory requirements of telecom.Björn helped lead the move to decentralizing the data team. When Björn joined, the central data team organization was 4 teams and 25 people. The data function was previously centralized and that was becoming a bottleneck, even for the legacy business. Now that the company had acquired a number of other sizeable companies, that central data team setup clearly wasn't going to scale. The company reorganized around business units and started to build data and analytics teams inside each BU.For Björn, who started in December 2021 just as Sinch started acquiring new businesses, the central data team was clearly not going to be able to meet the needs of this new organization that was about 8x larger than a year earlier. There was too much cognitive load on the team, especially trying to understand the product lines of five distinct business units, many of which were entirely new to the company.Björn gave a few good indicators of what to look for when considering if you should decentralize your data team. A big one is team cognitive load. Cognitive overload can take many forms: how many systems - especially complex ones - are your teams managing? Do they really deeply understand the systems? Do they need to deeply understand them to work with them? How many competencies does the team and each individual on the team need to have to do their day-to-day work? What percent of time is spent in meetings, especially follow-up meetings? One that wasn't mentioned is request turnaround time lengthening. Sinch had a strong signal that they should move towards a decentralized data team approach, per Björn. The business announced in early February 2022 it was organizing itself into five distinct business units. The business units started to build data and analytics capabilities internally but there would be a very distinct need for teams to share data with each other so a common self-serve data platform was necessary. If they didn't have a common platform, each business unit would need to do custom integrations with the other BUs, and that could be four custom integration points per BU. And each one probably would not be bi-directional so they might need eight per BU, four for sharing data out and four for ingesting data. Obviously, that would be a pretty bad situation.Regarding machine learning, Björn mentioned how difficult it can be to prioritize - there is a chicken and egg issue here: before data producing teams are willing to do the work to create a data product that will feed an ML model, they need to know how valuable or what will be the result of the ML model. But the ML teams need to get access to the data first to determine how valuable the data is before assigning a value. Thus there is a need for speculative bets, and those are hard to prioritize.Björn worked with the central data platform team to build out a common data platform with a data lake, data warehouse, and streaming capabilities. They are using as many open standards as possible as it prevents lock-in and also often means more integrations are available. The goal of the platform is to make it easy to do the necessary data engineering work for every business unit. But Björn mentioned it is important to prevent the data platform team from becoming another data team - if you are seeing your data platform team start to dig into the semantics of a use case, that's a red flag.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 19, 2022 • 25min
Weekly Episode Summaries and Programming Notes - Week of June 19, 2022 - Data Mesh Radio
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 17, 2022 • 1h 16min
#89 Flexibility is Your Friend in Delivering Buy-In; But Be as Rigid as You Can - Interview w/ Luca Paganelli
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereLuca's LinkedIn: https://www.linkedin.com/in/paganelliluca/In this episode, Scott interviewed Luca Paganelli, Data Architect at the Italian utility Gruppo HERA.To start, some interesting points and/or key takeaways and questions:Introducing new concepts and ways of working around data slowly - not looking to make a hard shift - has worked well. When Gruppo HERA debuted their new data strategy manifesto, none of it was a surprise and it was already relatively in-line with the way many were talking about data and moving forward on their data journeys.HERA's Data, Analytics, and Intelligence Automation (DAIA) team is not forcing domains to comply with HERA's data mesh-inspired guidelines but instead working with them closely to help the domains achieve their data related goals - delivering the "right thing". That gives the DAIA team strong influence to direct the domains' approach to data work without pushback and gives domains better confidence in the guidelines and can mitigate analysis-paralysis risk. This lack of rigidity and strong rules created a better sociotechnical environment to innovate but it can mean nothing really feels standardized because the domains can still choose to go a different direction.The paradigm-shift was initially "steep" for both IT and domain owners. But domain owners realized how much better they could serve themselves and external data consumers if they took over more data ownership. IT was afraid to give up control but started to buy in to the leverage and expertise they can provide by empowering the business domains to do great things with their data.A concern with not having broad standardization is bespoke solutions so it is hard to create broad reuse. There is also a challenge of people not being sure how much they can trust the data products. The DAIA team believes the tradeoff is worth it to drive initial buy-in with domain owners.Defining data products has been a struggle. There is a chicken and egg issue of 1) needing to understand who from the business should be involved in designing a data product but 2) data domains must be discovered to know who are the subject matter experts from the business to involve.For HERA, they are looking for data products to first serve their domain owners. This can be a slippery slope as domains may have valuable information but that isn't useful for them to analyze for their own purposes. So then other domains can't get to that data. But getting domains to freely share their data is a common incentivization problem.Purely technical focused data products will probably not serve demand. We need to focus on sharing information - what is the data saying, what is it about? Information is more than just the 1s and 0s of data.Gruppo HERA had to develop a sophisticated and reliable way to do their reporting to regulators but had not focused nearly as much on their internal data and analytics. But a progressively larger number of experimentations (spanning BI to AI) emerged where data started being used to drive the company. About 2.5 years ago, they developed a new team around data, analytics, and intelligence automation (DAIA) to start to rectify that and bring their data and analytics up to the same level as their regulatory reporting. As Luca said, they "were ready to scale data governance".One key change, per Luca, was the business had embraced a digital workplace program so the teams were able to create small-scale applications to fill gaps where business processes were not yet digitized. While small scale apps could be not scalable in the long-run, it still gave people a good idea of the benefits of moving to a more digital or data native approach.Thus far, the most important aspect of change management around data for HERA - at least from Luca's view - has been not making the DAIA guidelines mandatory for domains while helping domains understand what good looks like. This has meant better conversations where the DAIA team can focus on listening and responding to issues domains have instead of enforcing a rigid set of rules. Luca's team has seen a lot of success working with teams to deliver the "right thing" but with a lot of flexibility in how that is achieved. This has meant the DAIA team can make people feel seen and heard, which gives the team a good way to influence direction. The current risk or challenge from that is the large variety of quality of data products; but overall, the DAIA team views it as a success, mostly owed to not being overly rigid. They have created a better sociotechnical environment to innovate.Patience is a crucial part of the DAIA team's long term strategy, per Luca. They are helping domains address their current needs all while constantly rearticulating the overall vision. They are also helping teams by continuing to support active projects so there can be a good transition to new ways of working. The new data strategy manifesto wasn't like a bolt of lightning, the DAIA team had been working with domains to move them more towards the manifesto's approach so it already aligned with the domains' ways of thinking.Speaking of technical and organizational challenges, Luca mentioned that historically, data ownership was mostly a technical thing and was owned by a centralized IT team. Domains were at most owning very high level concepts and IT owned the rest. So, when they moved to their new approach, domain owners often reacted with fear at first. But once they got over the initial fear, they saw the power that owning their data has to make their data a great resource to the company. IT also initially reacted with fear but started to buy in when they saw how they could empower the business users. This process is also progressively moving things out of the siloed ways of working to cross-functional, Agile teams.Scott expressed his concerns about what a too flexible approach means for reuse, first on the data side if there are a wide variety of data products, and second about how to find reusable patterns which are necessary to scale a data mesh implementation.Luca reiterated that the DAIA team is neither IT nor business so they have to partner with both sides to get things done. Their strategy is to have the guidelines to make the standardized way the easy pathway but to not force the domain owners to comply. They partner with them and give them guidance but only try to influence them to do things the right way. Per Luca, by laying out the tradeoffs in an honest way, you can help the domain owners understand why you recommend a certain way. By letting people decide, they will commit more to making it work.As for defining data products, Luca mentioned how it has been a struggle internally. Many organizations implementing data mesh are struggling with this, including the very high level questions of how big should data products be and how many should we have. Should they mirror the source system? - In general, this is an anti-pattern - How do they encapsulate the subject matter expertise in the domain into their data products? There is a chicken and egg issue of needing to understand who from the business should be involved in designing a data product but they need to design the data product first before knowing who are the subject matter experts in that domain.HERA is using two different types of data products, source aligned and consumer aligned. Consumer aligned - which they call Consumption Data Products - are designed to serve a specific use case. These are fit-for-purpose data products and there is often a working backwards process to figure out which domains need to deliver what once the use case is established. There is also a focus on making sure to limit the scope of a data product so it doesn't get too complex or complicated to create or maintain. The source aligned data products - which they call Domain Data Products - are then built initially to "power" CDPs (use-cases) but are also designed to be more general purpose.For Luca, data products must first serve the domain owner as it is hard to find domains that are willing to be so altruistic, they will create data products simply to share with other teams. This can be a slippery slope as there are likely many use cases where a data product or even a small part of a source-aligned data product is not useful to the data-owning domain but is extremely useful to other domains. Incentivization can be very difficult though.When it comes to Domain Driven Design (DDD), when you first start to share the definitions of domains, many people create an extremely complex pictures of what a domain is in Luca's experience. He recommends trying your best guess at domains and moving forward, not getting overly exact. It's okay to make some initial guesses and work with the domain to define the boundaries. He also mentioned that data products that are purely technical solutions won't satisfy the demand for information by the data product owner so focus on delivering a complete product, not just data, but the information of what it is you're sharing.Luca wrapped up with some thoughts about how crucial it is work on the organizational operating model, to try to embrace domain driven design for data, and to be as rigid as the organization can handle in your guidelines - too rigid of guidelines can be seen as regulatory without any value so start less rigid than you'd like. Domains can see the rigid way as creating no value so they will pushback and often deliver nothing. "Perfect nothing is still nothing" as Luca said.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 13, 2022 • 1h 2min
#88 Data Engineering and Data Engineers' Future in Data Mesh - Interview w/ Joe Reis
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Links:Joe's LinkedIn: https://www.linkedin.com/in/josephreis/Ternary Data Website: https://www.ternarydata.com/Monday Morning Data Chat: https://anchor.fm/ternary-dataJoe and Matthew Housley's interview with Zhamak Dehghani: https://www.linkedin.com/video/event/urn:li:ugcPost:6915063013410582528/Joe's upcoming book, "Fundamentals of Data Engineering": https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/In this episode, Scott interviewed Joe Reis, CEO/Co-Founder of data consultancy Ternary Data, Co-Host of the Monday Morning Data Chat, and author of the upcoming book Fundamentals of Data Engineering.Some key points or takeaways specifically from Joe's point of view (not necessarily those of the podcast):Find quick, high-value wins. Too often people focus on the big wins and those become overly complicated and end up in failure.Most software engineers don't understand data well enough to be data product developers in data mesh, at least yet.Data mesh is a polarizing topic. And that makes sense as it is pushing boundaries. Many hope it can come to fruition but it is a bit of a utopian view.The future of data engineering is to move past managing pipelines to much higher-value work.Speed to achieving wins with data - with a clear return on investment and trust - is the first thing you should focus on. Get this right and you can have the "luxury" of building great data products.Joe started by discussing the kind of nebulous area within software engineering and data that data engineering has always played - sit between the source systems and the data output, converting the data in the source systems into something consumable for data users. Previously, that was mostly about making sure reports got pushed through and you hoped people derived insights. Now it's more about pipelines. But the way we store information in source systems, it is not in the format or shape we need for analytical purposes. So there needs to be a go-between.A big trend in data engineering currently for Joe is the abstraction of tooling. Some of that can be good - makes people more productive - or bad - means it's harder to understand what is actually happening under the covers. But for Joe, it's probably worth it to use the abstractions as they are able to do the heavy lifting and data engineers can focus on the higher value work. We might be coming to the end of the "pipeline monkey" era of data engineering so we can shift more focus to the data output, DataOps, orchestration, security, etc.For Joe, the biggest value-add the data engineering team can have is getting wins quickly. When asked about speed to returns versus repeatability, Joe said that the speed is more important, especially when you are trying to prove out the value of your data team. Trust is crucial, so you have to be careful to not move too fast, but trying to do big-bang projects is often a recipe for failure in his view.When asked what could be the signs an organization is ready to implement data mesh, Joe mentioned that if an organization is already seeing "wins" with data across a number of teams/domains, that's a very good sign. But you can't only have a few teams getting those wins as that means the overall organization data maturity is still probably low.Joe made a good point about how polarizing data mesh can be. When he speaks with some organizations, there are a few leaders who simply reject the idea outright. But many also simply don't see data mesh as ever being possible specifically in their organization. And that is probably true - low sharing / low empathy organizations need cultural change BEFORE trying to implement data mesh or the implementation will likely fail. Others, including Joe, see data mesh as a bit of a utopia vision - "imagine a world where..." - and that's pretty common. But Joe made two good points there: 1) if it were a safe concept, it would already be obvious - and so these large change concepts cause concern; and 2) just because we can't necessarily achieve the ideal, we can strive for a goal e.g. "being a good person" - being a better person is still a win even if you don't become the best person possible, right?A recurring theme throughout the conversation was the need for speed relative to data. Identifying and then executing on quick wins is crucial for data teams in Joe's view. So first, the data teams need to learn how to identify those opportunities to build momentum around the data organization as a profit and innovation center instead of a cost center. As Joe said, "you have to see problems before you can fix them".When working with software engineers to teach them data engineering skillsets, Joe has seen the software engineers are often easily able to pick up a lot of the mechanisms used in data engineering, e.g. managing the pipelines themselves. But they have little understanding of what the data consumers want. So it might be too early for most organizations to have their software engineers as the main data product developers for data mesh just yet. And Joe regularly sees that software engineers both don't understand data, as stated above, but also often don't care to either. It can be easier to teach data analysts and data scientists data engineering because they understand what data consumers really need.Wrapping up, Joe again circled back on the need to find high value wins quickly in data. He recommends to not get too complicated, look for the small wins. And to look at what you want to produce and somewhat work backward from there.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 12, 2022 • 29min
Weekly Episode Summaries and Programming Notes - Week of June 12, 2022 - Data Mesh Radio
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 10, 2022 • 1h 2min
#87 Choosing Tech for the Now and Future and Potential Woes of Decentralizing Data Teams - Interview w/ Jesse Anderson
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Relevant Links:Jesse's Data Teams Book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused-ebook/dp/B08JLFTPBVBig Data Institute website: https://www.bigdatainstitute.io/Data Dream Team podcast: https://sodapodcast.libsyn.com/siteJesse's LinkedIn: https://www.linkedin.com/in/jessetanderson/In this episode, Scott interviewed Jesse Anderson, Managing Director at consulting company Big Data Institute, host of the Data Dream Team podcast, and author of 3 books, most recently Data Teams.To start, a few takeaways from Jesse's perspective on the choosing technology side:You should make sure you have the right team in place to make good technology decisions - the team needs to be in place firstBefore selecting any technology, it's crucial to understand what you are trying to accomplish. And to understand that the technology will provide help in addressing the challenge but won't solve anything itselfFocus on: is this the right tool or solution for us now and in the future? What is the roadmap and vibrancy of the solution?"Technology must earn its keep", meaning you should understand the total cost of ownership and what is your expected return on investmentData tooling cycles are probably going to be 10 years at the most - prepare for obsolescence so you aren't overly reliant on any one technologyAnd some takeaways from Jesse's point of view on decentralizing data teams:Currently, software engineers aren't ready to be data product developers so you'd need embedded data engineers to handle creating and maintaining data products in data meshBut many data engineers are not willing to be embedded into domainsManaging the dotted line versus solid line of reporting between a functional team and the domain is very difficultThere are a number of cracks where crucial data can fall into and fail to find a good owner in a decentralized structure, especially aggregate data productsJesse started the conversation on how important people are to getting things right with data, especially making technology decisions. The chicken and egg question is do you need to have the right people in place first or do you want to make technology decisions that will attract people. In Jesse's view, you need the right people in place first as they will be the ones to make the right decisions on technology selection. The most important question for Jesse when selecting technology is what are you trying to accomplish with technology. If you don't focus on the target outcome, that is not going to work out well. And you should know, in general, what most of your use cases will be for the technology - use that to assess what is the right technology to choose.Also, for Jesse, "technology must earn its keep". Just because you made a decision on using that technology at one point, it must continue to be of more value than its cost. And you want to strongly factor in your long-term total costs, as best as you can estimate then, when looking at adding a technology. This is important for build versus buy, can you continue to keep something running, is the long-term roadmap a match to your goals and vision, etc.Jesse also pointed to how different data is to the operational side relative to technology cycles. Considering Hadoop, where Jesse focused in his time at Cloudera, 10 years - or even less - is realistic for how long data technologies might be around. Thinking in those cycles, you should think about where a technology is and where it is headed when choosing: what is the chance of obsolescence? How healthy is the project? You must have a longer-term vision, more than just does it solve our today problems.You should consider how aggressive you will be in tech adoption, per Jesse. Will you be comfortable with making early bets? How can you set yourself up to be able to migrate away once technology is no longer a great fit for you? Data mesh can make it easier to wean off a technology as what you expose to data producers and data consumers is rarely the underlying tech instead of an interface.Jesse talked about how right now, general software engineers / application developers are not ready or able to create good data products. One big issue is a lack of understanding about schema changes - on the one hand, you can't tell software engineers they can't make schema changes because that blocks application development but on the other, most software engineers do not understand the downstream impact of those schema changes. They are also, per Jesse, not well versed enough in how to store and share data about the domain to 1) maximize reuse and 2) create datasets that will be useful for analytics.Aggregated domain ownership is one issue Jesse pointed to regarding decentralization of data teams - who owns these products? Do they need to be products? Another aspect is something that's run through many conversations on the podcast - if we give domains the authority to do whatever they want, won't that cause chaos? Probably. So establishing best practices and giving people a common platform to use and reusable frameworks is necessary to make something like data mesh work.Another issue with team decentralization Jesse has is how to manage the career growth and happiness of data engineers. Many data engineers may not want to be embedded in domains. And do they follow best practices of the organization or, if the domain owner says do something quickly and not adhering to best practices, who do they listen to?Jesse finished by saying all your data work should have a purpose. Every organization should ask if the data mesh is truly worth it for them, both now and in the future. It's okay to say not now. It's okay to say not ever.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jun 7, 2022 • 14min
#86 Data Product Documentation - A Primer? - Mesh Musings 18
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf