Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Aug 17, 2022 • 12min

A Call to Action - Please Consider Being a Guest!

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Episode list and links to all available episode transcripts (most interviews from #32 on) hereData Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Aug 16, 2022 • 11min

#114 Protecting the Meaning of Data Mesh - Mesh Musings 26

Aug 15, 2022 • 1h 19min

#113 Data Governance In Action: What Does Good Governance Look Like in Data Mesh - Interview w/ Shawn Kyzer and Gustavo Drachenberg

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Gustavo Drachenberg's LinkedIn: https://www.linkedin.com/in/gusdrach/Shawn Kyzer's LinkedIn: https://www.linkedin.com/in/shawn-kyzer-msit-mba-b5b8a4b/Data Governance In Action: What Does Good Governance Look Like in Data Mesh - Interview w/ Shawn Kyzer and Gustavo DrachenbergIn this episode, Scott interviewed Shawn Kyzer, Principal Data Engineer, and Gustavo Drachenberg, Delivery Lead at Thoughtworks. Both have worked on multiple data mesh engagements including with Glovo starting 2+ years ago.From here forward in this write-up, S&G will refer to Shawn and Gustavo rather than trying to specifically call out who said which part.Some key takeaways/thoughts from Shawn and Gustavo's point of view:It's very easy for centralized governance to become a bottleneck. Make sure any central governance team/board that is making decisions has a way to quickly work through backlog through good delegation. Not every decision needs deep scrutiny from top management. To do federated governance right, you need to enable the enforcement - or often more appropriately the application - of policies through the platform wherever possible. Take the burden off the engineers to comply with your governance standards/requirements.Domains should have the freedom to apply policies to their data products in a way that best benefits the data product consumers. So if there are data quality standard policies, the data product should adhere to the standard for measuring completeness as an aspect of data quality but might be optimized for something other than completeness.The cost of getting anything "wrong" in data previously has been quite high because of how rigid things have been - the cost of change was high. But with data mesh, we are finding new ways to lower the cost of change. So it is okay to start with policies that aren't complete and will evolve as you move along.If you have an existing centralized governance board, that will sometimes make moving to federated governance ... challenging at best ... so you will need a top-down mandate to reshape the board. Look to meet the necessary representation across your capabilities (e.g. product, security, platform, engineering, etc.) but not create a political issue if possible.Look to add incremental value through each governance policy. And look to iterate quickly on policy decisions where you can. Create a feedback loop on your policies to iterate and adjust. It's okay to not get your policies perfect the first time, you can adjust them.Really figure out what you are trying to prove out in your initial proof of value/concept. If it's full data mesh capabilities, that can easily take 4-6 months.An interesting incremental insight: Zhamak has warned about organizations trying to scale too fast as an anti-pattern that may result in lots of tech debt or a failure of your implementation.An interesting incremental insight: in all of the data mesh implementations S&G have worked on thus far, the initial data product has not had any PII as that adds significant complications probably beyond what the value add of including PII would be in most cases.Your data mesh implementation team should be 1-2 people from every necessary capability.Data mesh is a large commitment - resources, time, focus, etc. - so you need to be prepared to fund it for the long-haul. This isn't an initial big-bang approach. But this is also why you should focus on continuous incremental value delivery once you get to delivering data products to keep up momentum.You will get things wrong as you move forward with your data mesh implementation. Look to limit the blast radius but it's absolutely fine and expected that you will learn and improve. Data mesh gives people flexibility and flexibility allows for making changes. Set up fast feedback loops and look to iterate rather than trying to get it perfect the first time. Perfect is the enemy of done.S&G started off giving the four general states of data governance in most organizations: none, centralized, decentralized, and federated. Many organizations, even quite large ones, have little to no major data governance oversight. As previous guests have mentioned, many get fed up with data governance only being a cost center - especially if it doesn't even offer much risk mitigation or regulator compliance - and essentially do away with their data governance. Decentralized data governance is an anti-pattern in general with each domain or line of business coming up with their own approaches, making cross domain boundary collaboration difficult at best - it's like each domain is speaking a different language entirely. Many companies move to a centralized approach but that often quickly ends up becoming a blocker without pretty specific controls in place. Rigid plus low throughput isn't great. Hence, why data mesh pushes for federated governance - governance with a central group to make necessary decisions and policies but where the people who understand best actually apply the policies to their work - namely the data products.So, per S&G, the federated governance structure in data mesh in general should be a centralized board or team representing many different constituencies throughout the organization necessary to make smart and informed decisions about policies. Then the policies are codified - and coded - into the platform for domains to easily apply the policies to their data products. The centralized team should focus on making quick decisions by delegating developing policy researching and development to people within each of their own constituent groups - e.g. software engineering, platform, product, security, legal/compliance, etc. That way, the leaders on the centralized board don't need to have all the context themselves to make smart decisions as the people they delegated to can ensure their constituent group's needs are met. And the application of policies to data products at the domain level is made easy - or at least far easier - through automation. This setup gives the domains more freedom in how they apply the policies to the data products.On speaking about greenfield versus brownfield for data governance, of course greenfield - meaning little to no data governance in place - is typically far easier according to S&G. It can be disconcerting to see large organizations with very large data practices and little governance but it's at least easier to only have to focus on creating and training instead of evolving and unlearning too. Either way, to move forward, look to build out the CYA - cover your...butt... - aspects of data governance first and work to build a minimum viable data governance board. Then you can start to ask about needs and create a backlog to start working through. But again, make sure to focus the board on making decisions and impact, not as a political entity. Easier said than done but showing them how to make decisions quickly and efficiently is great. And, with data mesh, policies can be changed or enhanced later - you don't have to get it perfect at the start.If you are in a brownfield deploy of your governance board, it can be a political minefield, per S&G, as there may be overrepresentation of certain teams. But you need to work to have the right representation of needs, the right diversity of capability. There needs to be a top-down mandate to really reshape the way your board is composed so you can get to that fast decision making capability. As a reminder, this is somewhat counter to what Laura Madsen recommended in her episode but aims for the same outcome. Possibly look to disrupt your governance if it ever becomes too slow and the bottleneck.So, you've got your governance board together - how do you get going for something like data mesh? According to S&G, you should first focus on policies that positively impact the technical people, e.g. that all output ports on your data products should be registered in the data catalog. And it's okay to not get your policies 100% correct upfront, you can adjust. Use a feedback loop to take in information about missing policies or currently deployed policies - are they meeting people's needs? Every policy should have incremental value. Security is obviously a policy area that could be considered as cost-only but it's still quite important to address and risk mitigation is a value-add.For S&G as previously mentioned, in order to keep things moving, delegation is crucial. If there is a truly important decision with major implications, possibly the leaders of the different capabilities represented on the board need to get more deeply involved. But for most policies, those heads should delegate as much as possible to people they trust to represent their interests and move forward. We don't need the end approvers to be overly involved in routine decisions. Much like when purchasing a solution, the CFO typically doesn't need to be involved in specifically deciding which data catalog to use if it is a small portion of the budget - the experts did the work and made a selection, you delegated to them for a reason presumably.S&G gave some advice around getting started in your data mesh journey. When they were working with Glovo, company management gave the team the time and budget to really build out the platform in tandem with the governance and the first data product. That took 6 months. And coordinating across all four pillars moving forward simultaneously was certainly not easy. If you don't have that amount of time and budget, you can do a relatively smaller proof of value/concept in probably 3-4 months; but Zhamak has warned of premature scaling causing a fair bit of issues for a number of companies trying to implement data mesh so trying to rush your proof of value/concept might not be the best idea. They also mentioned a pattern of the first data product at their clients not having PII as that complicates your initial platform needs for governance. And to pick a relatively simple source-aligned data product use case as your first data product.As to who you should have on your data mesh "tiger team" if you are lucky enough to have some full-time heads to staff it, S&G recommend having 1-2 from each necessary capability so 1-2 data engineers to help build out the platform and upskill your domain team, 1-2 folks on the governance team, 1-2 from product or elsewhere to do the data product management, etc. and then obviously the domain you are working with needs to be heavily involved. And whoever is on the team, prepare to do a lot of data product/data mesh evangelism.It's important to understand that committing to data mesh is a big long-term commitment, financial and otherwise, per S&G. Your implementation can't be a skunkworks approach, you have to be committed to moving forward together so you can drive the necessary buy-in. And it isn't just the initial implementation, you have ongoing growth of your implementation and maintenance. This is partially why so many guests have mentioned delivering continuous incremental value to make it easy to secure additional necessary funding.When asked about what parts of your federated computational governance should be in the platform versus at the data product level, S&G believe you should always look to create the affordances and the easy path in the platform. The application of policy via the platform is the best way to ensure compliance and also standardization, which makes it easier on data consumers. But any decision relative to the specific context or needs of the explicit product should be made at the product level. So, the decisions about how to measure data quality characteristics would be at the platform level but the SLAs to meet for a data product would be set at the data product level itself by the domain team.In wrapping up, S&G wanted to reiterate that data mesh isn't easy if you want to set yourself up for long-term success. It is going to take a lot of effort to get it going and deliver your initial data product and platform and governance policies. But by spending time to do it right, you set yourself up for gaining a lot of momentum. Don't get discouraged. And be prepared to get things wrong and then fix them, that's totally okay. Play, learn, iterate, improve. Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 14, 2022 • 33min

Weekly Episode Summaries and Programming Notes – Week of August 14, 2022

Aug 12, 2022 • 1h 13min

#112 Driving Buy-In and Finding Early Success - Kiwi.com's Data Mesh Journey - Interview w/ Martina Ivaničová

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Martina's LinkedIn: https://www.linkedin.com/in/martina-ivanicova/In this episode, Scott interviewed Martina Ivaničová, Data Intelligence Engineering Manager at the travel services company Kiwi.com.Some key takeaways/thoughts from Martina's point of view:The most important - and possibly one of the most difficult - aspect of a data mesh implementation is "triggering organizational change". Driving buy-in for something like data mesh is obviously not easy. As you are getting started, look to leverage 1:1 conversations to really share what you are trying to do and why and how this can impact them and the organization. These 1:1 conversations are crucial to developing early momentum.On driving buy-in for data mesh, really think about how to limit incremental cognitive load as much as possible on developers/software engineers. If you can keep cognitive load low, you are much more likely to succeed - succeed in driving buy-in and succeed in delivering value.When sharing internally about data mesh, it's important to focus on what it means to the other person. Using "data mesh" as a phrase can lead to a lot of confusion for people not on the data team. Make it clear what you are trying to accomplish - the what, the why, and the how. Using data-as-a-product as the leading concept resonated and worked well.Kiwi.com started driving buy-in by working with the engineering upper management, then found a few valuable and achievable first use cases to move forward. And they have kept cognitive low on the engineering teams while they learn how to deliver data as a product.If possible, the easiest way to drive buy-in is by finding a use case that is beneficial to the producing domain. If not, then look to spend the 1:1 time to really share why this matters.Kiwi.com is getting software engineers in domains to commit to simply sharing their data, not even really structuring into data products. So the software engineers in most cases are really only focused on maintaining high-quality data sharing mechanisms - read: pipelines. That is a relatively low initial cognitive load/low workload ask.Analytics engineers are creating the data products from the sourced data to satisfy consumer needs. Martina and team want to move to software engineers handling more of data product creation/management over time but it's a process. They plan for analytics engineers to upskill the software engineers by pairing with them closely.It might initially be more important to find a way to evaluate and iterate on what data is shared and how than getting to the most complex or valuable data product. You want to build the muscle around sharing data first before trying to go too big too soon.It's important to know what you are trying to prove out in your initial data mesh related deployment. It's okay to prove out you can produce data products before proving out you can build out the full mesh.A key success metric for a data mesh journey could be how many direct conversations and then actions come from data producers and consumers speaking without data engineering involvement. At Kiwi.com, these conversations are still usually driven by analytics engineers but that might change in the future.Data governance centralization didn't happen overnight. When you look to decentralize and federate your governance, you should look to be patient instead of trying for an overnight revolution.Martina started by discussing how historical - legacy might be too harsh - data approaches like the enterprise data warehouse haven't kept up with the mass proliferation of data sources. When we were taking data from the monolith or monoliths, it was far easier to think about what data you might have and try to arrange it into something consumable. But now, with data coming from so many microservices and from external vendors and partners, it just isn't possible to use the same historical approach - too many things are changing. The centralized data team trying to own hundreds of pipelines flowing into one central lake or warehouse that they also own - it just wasn't scaling. So when the Kiwi.com data team ran across data mesh, it was very exciting - it was a way for the people with the business context to conceivably own and manage sharing their data in a reliable way.The historical general approach to data governance - one centralized team trying to make context-dependent decisions for all the domains - just never made sense to Martina. They just could never know the context well enough to make good choices, especially good choices in a timely manner. She noted that if you are moving from that approach, centralization didn't happen in a day, it evolved. Your move to decentralization should also evolve - think thin slicing and decentralizing more and more rather than pushing all ownership to all domains at once. Martina then talked about driving buy-in, a topic Scott circled back on frequently throughout the conversation. She noted - as many have, notably Khanh Chau in his episode - how hard driving buy-in can be when people haven't felt the specific pain you are speaking to. So she and her team worked to really have deep conversations with the software engineers about how important treating and sharing data as a product can be and how the data team will work to maintain low cognitive load on the software engineers.So, how did they start driving buy-in? First, Martina and team worked with engineering upper management to make sure that as they moved forward with domain teams, they would have support. Then, they focused on finding good first use cases. What could be a use case that would drive significant value if they got it right where they could also limit incremental cognitive load on the software engineers? And what had a high likelihood of success to start to build out proof points and momentum?Martina mentioned how truly crucial the low initial cognitive load aspect was to driving their data mesh journey forward. The central data team wanted to spend at most 2-3 days with software engineers to teach them how to share data. Is that going to be them creating actual data products on their own? Quite frankly, no. It was about teaching them how to share data and probably more importantly in the long-run, how to think about sharing data - that data product thinking. Then, analytics engineers structure the data shared into actual data products. This setup means it is easy to evaluate and iterate along the way.Scott asked what were they trying to prove out initially - a dataset had value, they could build data products, or that they could build a data mesh? For Martina and team, it was more about building out a reliable way to share data, so their proof of value was focused on proving they could build data products. One really crucial aspect they wanted to test was could they bring the data producers and consumers together with a good outcome without the data engineers - so the producers, the analytics engineers, and the end consumers working together. And the answer is yes, they are seeing great results there! The direct relationships between data producers and consumers is spurring the data producers to rethink how they share and what they share - and very importantly what more data could they share.Martina, like many other guests, brought up the general industry need for redefining data contracts - they just don't do a ton of things we need. We don't have a good way to detect semantic drift or often even to prevent changes before they break something. It's even difficult with lots of existing tooling for data producers to see who is consuming their data - and almost non-existent on data producers knowing specifically how the consumers are using their data. And so many more issues that should be wrapped into contracts.In circling back on buy-in, Martina talked about how in a brownfield deployment, there are puts and takes. Negatives include dealing with the issues of existing tech debt, it's difficult to get prioritization, etc. But a positive is that you already have an existing backlog of requests where you can find some interesting use cases to try out for your data mesh proof of value/concept. As part of driving buy-in for the proof of value/concept, Martina and team had to do a lot of 1:1 conversations. It can be frustrating to have to do so but these conversations are crucial to building initial momentum. Martina had some issues when she tried to explain they were doing data mesh to non data people internally; it's so easy to get confused by data mesh, especially for those same people. So she created a 1-pager focused on the data-as-a-product concept to help people understand what they were trying to accomplish. Focus on informing people the what, the why, and the how. Data mesh is more of an implementation detail to them. This is what Scott keeps referring to as "unicorn farts" - in every bit of internal documentation for consumption outside the data team, copy+find+replace "data mesh" with "unicorn farts". Because then you will delete every mention of data mesh - and unicorn farts - so you can focus on what actually matters to the other party.Martina shared about the current role structure of their data mesh journey: data engineers focusing on the data platform, analytics engineers building data models on top of source aligned data to create consumer aligned data products, software engineers focusing on sharing source-aligned data, and data consumers producing aggregated data models across different data products. It is difficult to say they are building full source-aligned data products as of yet as they train their software engineers to really work with data and use data-as-a-product thinking. Remember, they are training them on sharing data for 2-3 days total - you can't bring someone that far along in learning how to handle data in half a week! Their goal is to embed the analytics engineers further into the domains to really upskill the software engineers more around data but it's early days.It might feel a bit obvious but it's good to say out loud, per Martina: triggering organizational change is the most important part to getting your data mesh journey moving. It will be difficult to get moving but trying to build out your platform early or trying to get teams to create data products without the organizational support is very likely to fail. You need to get that organizational change going.A few other interesting points to highlight:A key initial success criteria was seeing software engineers start to consider what additional data they could share that could be useful and how they could share it reliably.Buy-in for building a data product is obviously easiest when the domain will be the consumer but when that isn't the case, the 1:1 conversations and having upper engineering management buy-in made it possible to get domains to do the work.Figuring out who owns data brought - or bought - in from the outside can be extremely difficult. Who wants to own often low quality data from an external source that needs to be cleaned and made into a product?Kiwi.com leveraged their existing stack for their initial data mesh implementation. There are of course lots of missing capabilities but they can still deliver good incremental value without every piece of the platform in place.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 9, 2022 • 8min

#111 Applying Data Mesh Principles to Your Real-Time/Operational Systems - Mesh Musings 25

Aug 8, 2022 • 1h 7min

#110 Disrupting - Not Destroying - Your Data Governance to Drive Incremental Value - Interview w/ Laura Madsen

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Laura's book, Disrupting Data Governance: https://smile.amazon.com/Disrupting-Data-Governance-Call-Action/dp/1634626532Laura's LinkedIn: https://www.linkedin.com/in/lauramadsen/Moxy Analytics website: https://www.moxyanalytics.com/In this episode, Scott interviewed Laura Madsen, CEO at Moxy Analytics and author of the book Disrupting Data Governance.For the purposes of this write-up, when discussing data governance, it refers to the way many large organizations handle data governance at scale - a way that is very rigid and causes bottlenecks. We all know we can't stereotype or group every org together but general trends can be observed.Some key takeaways/thoughts from Laura's point of view:A big issue with today's data governance is that the concept of data stewards - the people who own the data concepts - is from 30 years ago and hasn't changed much despite the demands and scope changing dramatically.The data governance committee/council structure most organizations use is inherently inflexible and ineffectual. Those making the decisions don't really understand what's happening with the data under the covers and those who do understand have little ability to influence the wider committee outside their own domain. And thus, they become a major bottleneck.Data governance committees can be quite useful if they focus on communication and context exchange rather than driving decisions and work forward.To drive change in your data governance practices, you need to disrupt but not destroy. Start to break down the big picture into much smaller, bite-sized chunks that when you improve on them will incrementally drive value - Agile provides a good framework to approach this.You will absolutely have to throw out a LOT of your current data governance practices - over time - as you replace them with better ways of working. You will need to really evaluate each practice and assess if it will drive value or should be replaced."Marie Kondo" your data governance practices - really look at your processes one by one and ask "does this spark value?" Reference: https://storables.com/storage-ideas/marie-kondo-method/Current data governance practices does no provide incremental value to most organizations, they are about compliance and risk mitigation. If you can drive value creation, you can more easily drive change. People want to enable value creation or at least are hesitant to stop it in most orgs. Look for small ways to drive incremental value to build momentum.The current data steward and data ownership model essentially rewards innaction more than action. Action has risk and risk mitigation is a large part of the data steward and data owner's role. We need to change that relationship and reward enabling valuable use of data but within compliance.Laura is a fan of the hub-and-spoke model for data governance - and in general. To make hub-and-spoke work, 1) everyone has to really work on strong communication and 2) the central governance team cannot fall into the trap of trying to fix the data themselves, they must empower and enable the teams to fix their own data.Data governance teams must stop writing policies -> compliance and InfoSec should be doing that. Policies become something that can be audited - don't give regulators the path to fining you.It's crucial to understand that there is a "good enough" in data governance and it's often good enough only for right now. Find that line of good enough for now, look to reevaluate, and find places that aren't good enough or are just barely good enough to focus on. There will assuredly be lots.It will probably be very uncomfortable disrupting the way you do data governance - at first. Start to build that muscle memory with small, incrementally valuable changes.There must be balance between flexibility and rigidity in data governance. Too much flexibility causes chaos. Too much rigidity causes the pain you are probably feeling right now.Laura wants data governance professionals to know that she understands how difficult your role is and the work you do is very valuable. She sees you :DTo start to change your data governance ways from woes to "woah, this is working", first start by rethinking who is accountable.Laura started the conversation with her big question when it comes to data governance: How did we end up here? How did most organizations end up with a very rigid, not scalable, non value-add data governance approach? How are we doing data governance essentially the same way as 30+ years ago? How can that make sense given all the changes and advancements in tech/software in the last 3 decades? For Laura, it really doesn't make much sense and we should disrupt that model.With the data steward model as it currently "works", the steward is someone in the business that has subject matter expertise but often has a hard time driving incremental change because of politics. So we need to work on flipping the incentives and role goals to drive incremental value from governance instead of becoming a costly bottleneck with data stewards and owners preventing use instead of encouraging it.For Laura, the biggest issue with data governance right now is the governance councils and committees. They typically have worthwhile goals, ones every organization should strive for. But committee structure almost inherently means they will be ineffectual in driving high-value work. The data owners have no real line of sight to what's going on with the data and the data stewards - who do have that line of sight - can't move forward without approval. And after a few meetings of nothing really getting done, the required decision makers often stop attending. So the committee holds useless meetings instead of actually pushing work forward.Committees and councils can have a benefit in Laura's view if they are focused on communication instead of direct action. They can be great ways to share context internally, especially among key stakeholders.Laura is a big fan of the hub-and-spoke model of organization to drive things forward with governance. The key to leveraging a model like hub-and-spoke is strong communication touch points between the centralized hub - the data governance team - and the spokes - the domains. A common failure point with data governance teams, especially in the hub-and-spoke model, is that the governance team tries to fix the data instead of enabling the teams to handle their own data. As Jay Sen reiterated: empower people, don't try to do their jobs. Hub-and-spoke can probably work relative to data governance in data mesh but you have to be careful about what is centralized in the hub.Look at the roles you have that support data governance. Reconsider who does what with a simple “RACI” model of Responsible, Accountable, Consulted, and Informed. Flip the script by removing accountability for activities when the role has no ability to impact the work - i.e., making a data steward responsible for an entire data domain when they have almost zero ability to change workflows, impact data quality, or align expectations, etc.Laura recommends for people to stop "putting their heads down", working so they can drive something. Instead, really think about why you want to change what you're doing - what is your reason for wanting to disrupt your data governance? If it's just to shake things up, that's probably not going to go well. You will need some major force of will and perseverance to really make the change. So go in it for the right reasons.So, how do you actually start to change your data governance practices and overall approach? Per Laura, start by evaluating the ways you currently do data governance and start to look for ways to break your approaches down into smaller pieces so you can do small-scale disruption and deliver incremental value. Delivering in small increments will make it quicker and easier to deliver value while lowering you chance of failure. As you start to show you are adding value, you will gain momentum as most organizations do not really drive value from their data governance. People are typically pretty happy to enable value creation.Per Laura, you should really rethink the way you do all aspects of data governance. Nothing is sacred. Spend the time to really consider all aspects of your data governance and think if you should change it. And Laura even recommends to look to commit to tossing aside practices before you evaluate if they work. That way, you are having to pick things from the garbage pile rather than sticking with the status quo. It's a bit of a psychology but could be useful.Scott asked what typically causes organizations to really rethink their data governance. At least for Laura, she typically gets a call when the data governance leader leaves and the CDO or CIO need some help cleaning up the issues. And a data governance leader is typically a short-term role, per Laura. That leader typically drove so much through their own knowledge of "where the data bodies are buried" rather than through really scalable process so it can cause a major disruption when they leave.When the opportunity and drive to change your organization's data governance does arrive, Laura recommends rethinking data governance at the highest level. What are you really trying to accomplish? How do you get to "good enough"? How do you get comfortable with "good enough"? And it's crucial to understand that good enough for now may not be good enough for the future and build in a plan to reevaluate processes. But that bridge solution is still viable and valuable. It can be quite challenging to change the way people have approached data governance for the last 30 years. Break it up into small changes and get moving, build the muscle memory of change.For Laura, data governance is often a proxy or a reflection of your broader data culture. It's important to seek balance in your data governance approach between flexibility and rigidity, much like your data culture. Too much flexibility will create too much chaos to move things forward - there is too little communication and/or coordination. Too much rigidity is essentially the world we are in today for most organizations and we can probably agree that's not great.Laura wrapped up on a few points. 1) Data governance is crucial to driving trust in data. Work with people to really communicate what is happening in your governance approach to increase that trust. 2) It's very easy to try to tackle everything in data governance but focus on what matters. It's okay to have some sharp edges, what will drive more value? And 3) Data governance work is incredibly hard. Laura wants you to know she gets it and you data governance folks are seen. You do incredibly difficult and valuable work.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 7, 2022 • 27min

Weekly Episode Summaries and Programming Notes – Week of August 7, 2022

Aug 5, 2022 • 1h 14min

#109 Tying Data Strategy and Architecture to Business Strategy - Interview w/ Anitha Jagadeesh

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Anitha's LinkedIn: https://www.linkedin.com/in/anithajagadeesh/In this episode, Scott interviewed Anitha Jagadeesh, Principal Enterprise Architect at ServiceNow. To be clear, she was only representing her own views on the episode.Some key takeaways/thoughts from Anitha's point of view, some of which she specifically wrote:It is absolutely crucial to tie the data strategy to the business strategy. The business strategy must drive the data strategy which drives your data architecture.Architects need to lead the way in digging into use cases to get the specifics on what data producers are trying to solve for data consumers. Then, those architects can find the common patterns across use cases to tie to your organizational data strategy and also tie to your data architecture guidelines and principles. That way, instead of addressing challenges via point solutions, you can drive organization-wide choices that support many use cases via your data architecture.Architects also need to ask the probing questions to continuously tie work back to the business strategy and value or expected outcome for customers. If you aren't driving the business strategy forward, if you aren't helping the big picture, is the work worth doing?When it comes to data, companies shouldn't be entirely offensive - trying to leverage data for as much value as possible - or defensive - trying to minimize risk as much as possible. So organizations that have been very conservative need to push to be creative/offensive and high-risk organizations will get themselves into trouble if they don't start going defensive too.As we build the data strategy we have to catalog our data assets/products and contracts to access these data assets/products – internal, external, and third party. Next steps we have to enable active metadata to ensure the catalog is always current.Data contracts - especially SLAs and SLOs - are really crucial to driving reliable and scalable data practices forward. How can people trust what they are consuming without having to check it themselves unless there are very specific parameters and documentation of what they're getting? The data space needs to rework the way we approach data contracts.We need to be careful to not head down the same paths/ways of working - just with different names - that we've tried and didn't work. But we also need to focus on what we've learned from different approaches instead of reinventing the wheel where appropriate. Hopefully data mesh can thread that needle.When thinking about how you should split into domains, look at the business strategy. How does your organization tackle business challenges? That should inform how you create domain boundaries.One of the biggest challenges in data at the moment is centralization versus decentralization and/or federated. How far to go towards one or the other side across many many decisions is really crucial to your data strategy. Look for places to centralize support of multiple use cases but not take the decisions out of the hands of people who know best where possible.API-first is an important strategy for modernization of use cases. But it can easily lead to massive inefficiencies on the analytical side with large-scale queries. So we need to think about how we can do APIs in an analytical world and consider patterns and guidelines to support bulk data consumption with volume, performance, and limits. .It will be difficult but worthwhile for organizations to migrate existing data assets to decoupled data products. There are many ways to approach that challenge, such as the Strangler Fig Pattern, but you need to take lots of care to do it right rather than disrupt the ongoing business.Trying to serve real-time use cases - measured in millisecond latency - and certain types of other analytical queries from the same data product is likely to cause big issues. If you have a very large data pull from a service, that can greatly impact performance. Let's not go back to the days of trying to run large-scale queries against production and causing outages or look for other architectural patterns to enable performance like replicating data to other data products. Ideally large consumption analytical use cases should be managed with some limits for real time use cases or run them on analytical data products , which will have different infrastructure that is fit for analytical use cases.It is crucial to have your governance team switch from defensive-only and a bottleneck to an enabling team - allowing domains to make smart decisions and providing the center of excellence and standards to let the domains focus on making the value-add and domain context specific decisions where possible.It's crucial for both sides in a potential data initiative/project to share as much context as possible about what are the potential outcomes weighed against the potential costs. How can both sides collaborate to maximize the return on investment? Just seeking the highest return possible is what has doomed many data initiatives - let's move past that way of working.We are heading towards Hybrid cloud, multi-vendor, multi-region, real-time needs of data. That will require us to rethink architecture that can scale and support agility. Anitha has seen a lot of data and engineering practices and patterns over her long career. In some ways right now, she is seeing many people heading down close to the same paths - just with new labels - that haven't worked. Data mesh tries to address a number of these historical challenges but we should make sure to deeply understand what history has taught us so we don't need to reinvent everything or make the same mistakes. History may not specifically repeat but it's easy for it to rhyme.For Anitha, a lot of the approaches people are trying in data miss the mark by not focusing on the big picture first - what is your business strategy? Your business strategy should drive your data strategy, not even just inform it, and then your data strategy should drive your data architecture. Far too many people start at the data strategy or even data architecture level.Anitha - like many of us - is seeing major changes in the industry with most organizations transitioning to cloud, product-centric, and/or API-first approaches. Industries and organizations that have traditionally focused on defensive data strategies - those that protect the data to minimize risk such as compliance - will need to get offensive to compete and drive value. Luca Paganelli's episode covered how HERA is transitioning from defensive to a balanced approach. But on the flip side, the companies that have focused much more on offensive data strategies - trying to derive as much value as possible from data with little controls in place - really need to step up their defensive game. Companies with a balanced approach to offensive and defensive strategy are generally the most likely to win.Data contracts is one of the biggest unresolved or not well solved issues in data for Anitha. The way most organizations are still doing data contracts - which is often not at all... - just isn't working. SLAs (service level agreements) and SLOs (service level objectives) are crucial to driving data trust when it comes to contracts. Emily Gorcenski's episode covered data SLAs and SLOs in-depth. There are some approaches emerging but as there have been many episodes of this podcast covering data contracts, it's still a quite immature data practice that needs further work.When asked about how to drive good, broadly applicable choices rather than just solving for the specific use case, Anitha talked about again circling back to the business strategy and the business use cases. You need to not be simply reactive to requests but look at how those requests play into the bigger picture. Architects should play a role in digging deep into use cases and requests and then finding the common patterns that support your architecture runway for teams to develop products. They need to dig in with many people across the organization and find what you really need to solve for in general across your many use cases. That way, you can address a broader scope of challenge more easily rather than building to each use case. Much easier said than done or course.Anitha made an interesting point about how your business strategy and data strategy should drive your domains. Yes, every organization or industry has a different domain map but even when thinking about what you are trying to accomplish and how your organization tackles challenges, you should look to use that as your general approach to mapping out your domains. Is that business need/capability? Is that application-first? Is this a domain specific compliance need? Etc. Piethein Strengholt talked about multiple different ways to map domains in his episode.When asked about how organizations can think about centralization versus decentralization in data, especially regarding governance, Anitha admitted it's very hard to create rigid rules that are actually good despite how easy that might make things. You need to ask how you can centralize the standards and the tooling so you can support multiple use cases but not have centralized decisioning when the domains know best. Having a hybrid governance - grassroots and centralized would drive better data management practices. Per Scott, it's definitely not a black-and-white decision and this will be one of the hardest challenges for many organizations in the next few years.A big trend for Anitha in digital modernization is more and more vendors heading towards API-first. But trying to use APIs like we have in the past will not be efficient at all in many cases in the analytics space. A very large query via API could be extremely inefficient. We need to think about how we can serve analytical needs better in an API-first world. What actually is an analytical API? How can we grab 100,000 records in a single query that isn't a 1 by 1 pagination? Still remains to be seen.While many organizations would love to have a greenfield to deploy their data strategy and data initiatives, it's just not a reality for most according to Anitha. There are existing data assets in place. Moving them to being data products is essential but business also must go on with their current day-to-day. It's a difficult challenge to migrate people over to new data products. And data monoliths have very unclear sets of data products all intertwined. As an example, in the past, Anitha and her team were the victims of their own success as they built a very successful data warehouse that more and more teams moved to use. As the number of use cases and load increased, the performance decreased. We need to move to decoupled and more scalable ways of working in data to prevent success from being the path to failure and/or pain.Anitha and Scott discussed how important it is to build specific solutions to be fit-for-purpose, especially around SLAs and SLOs. On the data side, if something really needs to serve something "in real time", meaning measured in milliseconds, you wouldn't also want to allow heavy analytical queries that could slow down what it is serving. Which circles back to why API-first is currently challenging. But you should also dig into what people mean when they say "in real time" because it is often "not on a 24 hour delay but 2 hours is fine". Get specific, dig into details and the why.Anitha has some specific recommendations regarding data governance as she views it as crucial to really getting data products rights. She recommends creating a general center of excellence and central tooling support but with grassroots decisioning when it makes sense. So create the standards centrally and look to empower teams but also have that center of excellence to serve as a backdrop to be the experts on how to meet general governance needs like comply with GDPR, CCPA, etc. You want your domains focusing on the value-add decisions, have the central governance team look to be an enabler.A key responsibility for all architects in Anitha's view, is digging into how and where are you planning to use this data? What is the target use and more importantly, what is the target outcome? When digging in, you can really assess not just what might be the return, but what is the cost - both upfront and ongoing. It's easy to think about how great it could be to have this massive set of data but there needs to be a balanced return on investment. And time-to-market is also crucial. So we need to encourage collaborative negotiation around data requests instead of simply handing over requirements.In wrapping up, Anitha circled back on the concept of architects - and other roles too - really asking probing questions on specific use cases while keeping the big picture in mind at all times. Ask if this work supports the big picture, the business strategy. If it doesn't or if there isn't a clear tie to the business strategy, is the work worth doing?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 2, 2022 • 12min

#108 The Slippery Slope of "Real-Time" and Data Mesh - Mesh Musings 24

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app