Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Nov 27, 2022 • 26min

Weekly Episode Summaries and Programming Notes – Week of November 27, 2022

Nov 25, 2022 • 1h 16min

#159 Focusing on the Problems - And Business - at Hand in Your Data Tool Selection Process - Interview w/ Brandon Beidel

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.LinkedIn: https://www.linkedin.com/in/brandonbeidel/In this episode, Scott interviewed Brandon Beidel, Director of Product at Red Ventures.Some key takeaways/thoughts from Brandon's point of view:Be willing to change your mind, especially based on new information. Be willing to measure and iterate. It's easy to get attached to tools or tech because they are cool. Don't! Stay objective.It's crucial to align on what problem(s) you are trying to solve and why before moving forward on vendor/tool selection, no matter build versus buy. If it doesn't have a positive return on investment, why do the work?Beware the sunk cost fallacy! It's easy to not want to shut something down that you've spent a lot on. But don't throw good money after bad.When requirement gathering/negotiating, have a 'maniacal focus' on asking "what does this drive for the business?" You can quickly sort the nice-to-haves from the needs and you can have an open and honest conversation about cost/benefit of each aspect of a request.When thinking about maximizing value, there is always one constraint that is the bottleneck. You can optimize other things but they won't drive the value. Find and fix the value bottleneck.A simple two axes framework when thinking about use cases and requirements is value versus complexity. Look for high value low complexity first.Be open and honest in discussions around expected costs of work/tools - which can be considered part of the complexity. The data consumers understand the value and can weigh the return on investment.It's very important to understand data consumers' incentives so you can collaboratively figure out what is best for all parties.Look to create - in the open - a decision journal relative to build versus buy / vendor selection. It will create an open environment and get your thoughts better organized.Your decision journal will make it easier politically to say you have new information and should consider a change. And you can better measure against if your assumptions were right or it's time to reevaluate if a tool or solution is still working for you.It's very crucial to look at potential major success of a tool selection - what happens if our use is 10x, 100x our initial expectation? That can lead to really poor unit economics in the future for certain selections so it shouldn't be overlooked.It's easy to over innovate. Think of having a certain number of innovation tokens. The cost of change is real and also hits people's patience. Look to see if existing tooling or capabilities support most of your use case first.Total cost of ownership - not just initial purchase cost - is crucial. How much of your team's time will be spent managing and maintaining the tool? Look especially at skills, governance controls, and ability to measure if you are successful.Perfect is the enemy of good in choosing tools. Use a well-defined process to avoid really bad decisions but spending time to find the absolute best solution when any one of six choices will do just fine is rarely worth it.Having your reasoning and process written down and in the open drives trust. Trust towards the initial decision and trust for when it's time to reevaluate at tool. It also makes it easier to spot if something relative to your initial assumptions has changed.Seek out those who might be the most against your decision. Take the time to understand their pain points and concerns; try to incorporate their concerns and align their incentives if possible.When adding a new tool or serving a new use case, focus on how you will measure if you are successful now and in the future. It doesn't have to be perfect but otherwise, you don't know how well you are doing and will miss out on a great learning experience to do better in the future.When you select a vendor, there is a logical time to reevaluate your choice and if it's right going forward - the contract renewal. And there are easily defined economics in play. You should do the same for anything you've built - set an artificial time to reevaluate, don't wait for things to go bad first.Consider using the anti-corruption concept from microservices in data. You can avoid a lot of data integration costs and you are more easily able to rip things out of your platform. But it's okay to leverage proprietary solutions too, just be cognizant it may become an issue.Involve the data consumers early in the process around serving their use case. And it helps for them to have skin in the game so they are focused on driving to the most business efficient outcome.Brandon started off with a theme he'd hit on multiple times because it's so important: before proceeding on selecting a tool/solution, agree on what needs to be done and why. What will this drive for the business? It's easy to lose the forest for the trees - or even the leaves - in building out data platforms. The first part - agree - is necessary because you need alignment to move forward with the proper understanding of the problem to be solved. The what needs to be done and why part means there is a clear roadmap and that you have a specific problem you are trying to solve when doing your tool evaluation instead of focusing on the tool or feature.Having a maniacal focus on 'what does this drive for the business' will mean you can align better on what is needed for a use case versus "a Christmas list" as Brandon put it. Having clear and open communication about what is a requirement versus a desire and the cost of each potential item on a data consumer's list has led to very efficient prioritization for him.A key way of working when embarking on a new use case is to involve the data consumers early on - and make sure they have skin in the game, according to Brandon. The data team's engineering time being on the data consumer's P&L means the data consumers are more focused on driving to key results than cool features or nice-to-haves. And having open and honest discussions about the expected costs to deliver on each really helps them weigh the benefits. An important part of getting to a good outcome in these discussions is understanding and attempting to align on everyone's incentives.Brandon mentioned how when discussing cost/benefits and different platform approaches, it's very easy to get overly complex. But that hurts the conversation and often devolves into technical discussions with people who care about the business output, not the tech. Brandon has two axes that he uses - complexity and value. Don't overcomplicate it. It's pretty easy to start with use cases that are high value and low complexity when you start to look at it through this lens. High value but high complexity use cases are tough but can obviously provide very significant value when you've taken care of the low hanging fruit.One thing Brandon mentioned - and Scott recommends more broadly for data mesh journeys - is a decision journal. Having a place in the open where you write down the criteria for a decision makes it so people can feel more comfortable with the decisions made. What were the capabilities needed, what was the problem, what was the expected value, etc.? When getting down to the decision itself, how viable is the solution, what are the alternatives, what is the likely cost, what are the failure scenarios, etc.? It helps you to reevaluate in the future as well and have empathy with past decisions. Brandon has a list of many more crucial questions.A really interesting point Brandon brought up regarding writing out your decision criteria is what happens if it's wildly successful. What happens if the tool/feature you choose, whether build or buy, has 10x the expected usage? 100x? Are the unit economics going to be good or will this potentially cause issues and how do you plan to adapt?According to Brandon, looking at total cost of ownership - not just the short-term or initial purchase cost - is crucial when selecting a tool. Do you need training to actually leverage the tool and manage it appropriately? Does it integrate well with your existing platform/tools? Again, this circles back to value versus complexity. Costs should be factored into the complexity discussion.Brandon emphasized perfect is the enemy of good. There is rarely a good return on finding the absolute best choice - the real benefit is in avoiding the wrong choices. If there is a 5% better return on tool B versus tool A but you had to spend months figuring that out - or what if it's 6 tools…? - that's not worth doing. As part of Brandon's decision journal recommendation, he circled back on a few other benefits. A big one is that people are more likely to be aligned with the decision if they can follow the logic. If it's just a choice instead of seeing why the choice was made, there's often more friction and pushback. Also, it's easier to monitor if things have changed relative to your assumptions when you have your assumptions explicitly stated :) Having these assumptions on paper also gives you better buy-in to make changes because again, people can follow the logic.When it comes to driving buy-in, Brandon recommends seeking out the people who are most likely to be detractors to your potential solution. Use collaborative negotiation. At the very least, go and understand their context and pain points. Try to incorporate that into your solution and look to align incentives where possible. Too often people don't feel seen or heard. As many guests have mentioned, look to set your success criteria and especially ways of measuring before you start implementation work. It doesn't have to be perfect but otherwise, are you able to measure when you are doing well? And you can learn from things that don't go to plan much better if you can measure against an actual plan.Brandon discussed how when you make a choice to go with a vendor, the contract renewal - or specifically a few months before the renewal - is the time to evaluate if it was a good choice and if you should continue forward with that choice. You should set up an artificial timeline to do the same for anything built internally. Instead of waiting for signals that you've made a wrong choice, regularly reevaluate. It's important to reflect back and see if it's actually solving the challenges you wanted it to solve.Beware the sunk cost fallacy! According to Brandon, it’s very common to want to chase things where you've already spent lots of time and/or effort. Or if it had a lot of promise and isn't meeting the expectations. Don't throw good money after spent money. Take it as a learning opportunity and move on.Circling back on tool stewardship and total cost of ownership (TCO), Brandon uses a framework of three main things to consider: skills, governance controls, and ability to measure. Do you actually have the people who can leverage a tool? Do you have the governance in place to use it properly? How will you measure if the tools are successful and being used as expected? He had a lot of good examples in the episode.Brandon recommends people look into the anti-corruption layer concept from microservices in your data platform. It can lower the integration costs and also make it far easier to rip things out. You don't want to focus too much on this though and never leverage proprietary features. You don't need to build every capability from scratch but also don't unnecessarily lock yourself in. Some other tidbits:Your business counterparts probably won't care much about which vendor or feature versus what it gives them. Start at the high level mapping what's needed.You need to define the problem you are trying to solve, not the vendor.People are only willing to deal with so much innovation. Think of having tokens that people collect from you when you try something innovative and new. That's not an easily renewable resource. Look to what you have already to see if it will work.When thinking about maximizing value, there is always one constraint that is the bottleneck. You can optimize other things but they won't drive the value. Find and fix the bottleneck."Knowledge has a half-life, decisions have a half-life." Don't get analysis paralysis, look to move quickly.Be willing to measure and iterate. Be willing to change your mind, especially based on new information. It's easy to get attached to tools or tech because they are cool. Don't; stay objective.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 23, 2022 • 17min

#158 Zhamak's Corner 10 - Blazing Trails not Blazing Saddles - Setting Yourself Up For Success

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.What can we do now relative to data mesh with what we have? People want to move, not wait for the tools to evolve. We can start to shift in anticipation of tooling getting better. It might not make things a ton better now, but when tools start to emerge, then we can jump ahead quickly. Learn from what happened in the API revolution and don't compromise on interoperability - that will just lead to high quality data silos, which is not a great outcome. And we need to get to a place with data where consumers have a delightful experience going from discover to learn to trust to use with little friction.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 21, 2022 • 1h 14min

#157 Getting Practical with Data Privacy - Interview w/ Katharine Jarmul AKA K-Jams

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Katharine's LinkedIn: https://www.linkedin.com/in/katharinejarmul/Practical Data Privacy (Katharine's book in early release): https://www.oreilly.com/library/view/practical-data-privacy/9781098129453/Katharine's newsletter: https://probablyprivate.com/'Privacy-first data via data mesh' article by Katharine: https://www.thoughtworks.com/insights/articles/privacy-first-data-via-data-meshdanah boyd [sic] website: https://www.danah.org/In this episode, Scott interviewed Katharine Jarmul AKA K-Jams, Principal Data Scientist at Thoughtworks.Some key takeaways/thoughts from Katharine's point of view:Increasing privacy around data does NOT mean you have to give up value.Instead of data privacy being a blocker, it can turn nos to yeses because there is a better ability to restrict illegal/unethical use. Regulatory and legal people want to say yes, so give them the ability to do so.There are lots of tools available to enhance your data privacy now. This isn't a pipe dream. That said, don't look to replace person-to-person conversations and decisions with tech. You'll learn when to use what on your journey, it's okay to iterate :)Empower the people who know the data best with privacy tooling. Don't make them build it themselves either. They will know best most of the time - but obviously provide them a path if they have questions/concerns.Privacy is a sliding scale, not all or nothing. You can start off pretty not great and still make progress as you continue to assess where you can be better.Use privacy as a lens for how valuable your sensitive information actually is. Can it be used appropriately? If so, what's the value of leveraging that data and is it worth the privacy cost?When thinking about what level of privacy to enact in systems, think about your comfort level if it were you. Would you want your location history shared at all times? Using a multi-cultural approach is important too as different cultures have different norms around privacy.There are some basic stakes privacy tech you should use as part of just general information security to protect sensitive information. It might even be required by law. Then, you should look to layer on privacy-enhancing technology on top to do things like actual anonymization.Privacy choices should be broad organization-driven decisions, not something one person decides to implement. But if there isn't buy-in, it can sometimes be tough to show people the business value of data privacy.Look to the privacy/utility trade-off - how can we maximize privacy but also maximize what we need to do the task at hand?An emerging practice that has a big potential impact on privacy in data mesh is federated analytics and distributed queries - can we do analytics on data where it is without moving it to a big central place to do the analysis?A lot of data scientists don't want to work on things they feel are problematic. So work to prevent problematic use cases and the problematic data practices that can give rise to them.Privacy can be about first order problems like how do you anonymize your data. But it can also be much broader, like understanding the impact of what work we do on people and society as a whole.We need to make privacy more transparent and obvious, otherwise people feel tricked. People want to know what they get out of sharing their information.In data mesh, look to offer the easy ability to adjust privacy - privacy 'knobs' - so the domain expert can easily make choices without having to implement the tech. Enable privacy via the platform.Typically, data privacy is applied at the data source. We are still learning how to do data privacy well in a federated setup with any cross domain data combination restrictions.If you're constantly rejecting use cases, that will just create shadow IT. As an organization, figure out how you can get to a yes where you can.Scott Note: K-Jams is writing her book to fill the gap in information about privacy technology between very basic info and academic research.Katharine shared how she first started looking into privacy. She was doing natural language processing (NLP) and was working with lots of customer data that was supposedly clean and anonymized - but it wasn't. It raised a red flag for her as that can mean working with some problematic things. So she went looking for solutions and discovered there was a lot of existing privacy technologies - and even more now. We can look at first order privacy questions like how do we actually anonymize data or bigger and more philosophical questions about how do we prevent harm on society from our technical work.While privacy can be a bit of a nebulous topic, Katharine recommends starting from the gut check, basically: what would you be comfortable with? Would you be okay with your chat history being shared with others? What about your location at all times? If no, look to prevent that from happening in what you are working on. But it's important to also use a multi-cultural lens on what is acceptable as what's okay in the US - or at least tolerated - is not in Europe. And this of course can extend to fines from GDPR but also reputation. Think about how data can be misused and look to prevent that. Scott Note: see episode 143 for more about data ethics and what we can do to prevent misuse.Given that privacy is very contextual to the individual, K-Jams believes it's far too often that when we translate privacy decisions to code, we lose the context. And part of that is making privacy decisions and options obvious. If you are collecting people's locations, are you giving them an easy ability to see why or opt out? It's very easy for people to feel tricked. Katharine gave an example of how digital native people - mostly teenagers on Instagram about 10 years ago - had different accounts - called 'Finsta' accounts for fake-Instagram - with different levels of anonymity and privacy to better navigate the opaque privacy settings and share data more granularly with who they wanted. It gave them an ability to control how they were sharing their data and with whom. But most people aren't capable - or willing - to do something like that. To drive trust, K-Jams believes we need to show people what they are getting from sharing their data, what is the benefit. Then they can make an informed decision. But privacy settings are often opaque at best. She believes that companies leaning into privacy conversations with users will create a better relationship with customers. How are you delivering value back to customers from them sharing data with you? And increasing privacy doesn't mean you have to give up on the value of your data either.A few privacy techniques where tooling already exists Katharine mentioned are pseudonymization, tokenization masking, and format preserving encryption - she recommends using these as some basics to protect PII (personally identifiable information) or other sensitive information. These are just basic stakes information security best - maybe even just 'not bad' - practices. Then you want to look to potentially layer additional technology on top like differential privacy. We can even leave data where it is to do federated analytics and federated learning, which has implications for data mesh and machine learning.When looking at the value of privacy, it can be tough to drive buy-in internally - people assume cost with no additional value. But according to K-Jams, you can look at the privacy/utility trade-off. How can we simultaneously maximize privacy while still not inhibiting the work we need to do? And how, in data mesh, do we actually find those sweet spots. Katharine believes it's through giving the data owners the ability to tune privacy - think knobs - to the specifics of the need/use case. That's part of doing federated computational governance through a self-service platform after all. Try saying that 3 times fast…K-Jams believes it's easiest - at least with current technology - to apply privacy at the data source. But when thinking about something like data mesh, there may be additional challenges like data from domain A and domain C should not be combined. So we are still learning how to do data privacy well in a federated environment. Scott Note: Jesse Paquette covered this in healthcare data in episode 10 where certain anonymized information could be joined with other anonymized data to make it personally identifiable. Many people are saving those "tricky" use cases for later or not trying to automate privacy and cordoning off those data products except by request.When leveraged well, Katharine believes data privacy technology can actually add more value. If a data producer is not sure how data consumers will use sensitive data, they are very unlikely to share it. But if they can lock down the data in certain ways but still give them access, that is a win-win. The data consumers get access to information they wouldn't have gotten otherwise and data producers can still sleep at night. It can turn a no to a yes. Sarita Bakst mentioned something similar in episode 52. And you can also get past legal and regulatory barriers if you do data privacy right - your legal and regulatory people want to say yes, so give them the ability to turn their no to a yes. Offer up potential privacy concerns offsets - say only using it in a sandbox to start - to see where their issues are.For K-Jams - and Scott - the desire to remove the people from the technology aspects of things like privacy ends up being silly. We can't make decisions only via the tech. Stop trying to replace conversations with technical solutions, sometimes people just need to collaborate to get where we need to go. Don't make it a yes or no decision for someone like legal, exchange context and look to collaborate on a positive outcome instead of 'can I do this?'Katharine gave a good overview of how to move up the privacy ladder about 47 minutes into the interview (not the episode; likely ~55 minutes into the episode?). How do you move from not so great to okay to pretty good to good (but using meh -> eh -> heh -> hah because of Scott…). Privacy isn't all or nothing and you can improve and iterate.Quick tidbits:Second-layer privacy enhancing techniques mentioned: differential privacy, data minimization, federated analytics, federated learning, distributed querying, encrypted computation, and secure multi-party computation.You probably won't get your privacy perfect on your first try. That's okay. Look to prevent regulatory/compliance issues but much like all aspects of data mesh: try, learn, iterate.Think about what you can and cannot show in a data catalog about potentially sensitive data. You can share descriptive statistics and information about use cases without exposing the sensitive data until you know a new use-case is allowed/ethical. Look to share as much information as you can - where appropriate - instead of locking down anything related to sensitive information.Empower the people who know the data best with privacy tooling. Don't make them build it themselves but they will know best most of the time - but obviously provide them a path if they have questions/concerns.It's very easy for privacy concerns to become overbearing. If 90% of the time, you reject use cases, you will create shadow IT and that is far more dangerous for legal and regulatory reasons. Look to exchange context and work towards a viable solution. Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 20, 2022 • 28min

Weekly Episode Summaries and Programming Notes – Week of November 20, 2022

Nov 16, 2022 • 18min

#156 Zhamak's Corner 9 - A Vision of the Data Product Developer Role in Data Mesh

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Who will be the data product developer in data mesh? There has been a misconception in Zhamak's view that the application developers should be the ones focused on building the data products as well - but she thinks they already have a full-time role :) But, we need someone applying software engineering practices and data know-how to building data products. Right now, to do data work, you need way too much tool knowledge instead of data understanding. We have hyper specialized data roles - ML engineer, data engineer, etc. - when we should have data developers that can tackle these challenges better.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 15, 2022 • 14min

#155 Phase Shifting - Preparing for Data Mesh Adoption Going Wide in Your Org - Mesh Musings 35

Nov 14, 2022 • 1h 13min

#154 How Can Data Marketplaces Help Realize the Most Value from Our Data - Interview with Mozhgan Tavakolifard, PhD

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Mozhgan's LinkedIn: https://www.linkedin.com/in/tavakolifard/In this episode, Scott interviewed Mozhgan Tavakolifard, Data and AI Lead for the Nordics at Accenture. To be clear, she was only representing her own views on the episode.Before we jump in, most of the conversation was about external data marketplaces rather than internal data marketplaces within an organization. It's also important to note that data marketplace technology and implementations are still in the relatively early stages - it's quickly evolving and maturing.Some key takeaways/thoughts from Mozhgan's point of view:Data marketplaces - internal and external marketplaces here - significantly lower the bar to data consumption because of standard metadata and user experiences. You should be able to easily see quality metrics, who owns a data product, access documentation, etc.Data marketplaces, when done right, significantly lower the time to value realization for both data producers and consumers/purchasers. And standard quality measurements and metadata make it easy for consumers to understand how much they can trust data to make purchasing decisions easier.Practices and tools are emerging for tracking data quality all the way to source to increase the trust data consumers/purchasers can put on data, especially for data marketplaces.For external data marketplaces, trust and security are still major pain points. How can data producers trust consumers will protect the data they acquire and use it legally and ethically? What is their risk to consumers behaving improperly??Controversial?: Mozhgan believes smart contracts and blockchain/distributed ledgers can provide for compliant use by others purchasing data. Some marketplaces are already doing this.For data producers, they also want better ways to ensure data consumers/purchasers are only using data in an agreed way so they can charge for any additional use cases. So they will be heavily incented to work with marketplaces that have tracking mechanisms in place."Data ethics is a nightmare," even before we think about data marketplaces. And that's not just data bias.We can use a number of the techniques and guardrails used to ensure ethics around bias in AI to apply to data marketplace ethics around bias.In a data marketplace, ethics falls much more on data producers than most people realize/expect. You should not sell data that can be misused! One way to prevent misuse is to sell insights instead of data itself.Look to focus much more on the business returns of data work. Far too much is focused on the value generated without looking at the costs.It's crucial to see organizations as living, breathing ecosystems. Design your organization and ways of working to be able to adapt.For Mozhgan, data mesh is a perfect fit with data marketplaces as a data marketplace makes it simple for producers to easily share data in a standardized way and consumers to easily find and consume data with standardized metadata and access. Simply put, data marketplaces are the most sensible place and mechanism for sharing data in her view. They significantly lower the barrier to getting access to data and being able to understand data - including how much they can trust data.So data marketplaces are good for internal data sharing but even better for being able to monetize your data externally according to Mozhgan. Again, the standardization and clear rules about what is allowable use means a faster time from discovery to value for both data producers and consumers/purchasers. Data having clear and concise SLAs means consumers can quickly go from discovery to trusting the data, meaning they can quickly leverage for their own use.However, major pain points for external data marketplaces are trust and security - for data producers, they must create the trust in their data for others to use it but there is also a big risk to how data consumers/purchasers actually use data producers' data. Is it compliant/legal use? Is it ethical use? Will those data consumers properly protect the data they consume? If not, what is the risk to the data producer? How can we ensure proper behavior - whatever that may mean to the data producer - by the data consumer/purchaser?Mozhgan believes blockchain/distributed ledgers might provide a good answer to be able to track compliant usage - are consumers meeting their contractual terms? Smart contracts are supposedly able to track this. However, ethical concerns are still not addressed in smart contracts, at least in a simple and repeatable way. The ways of doing this are still evolving. And she believes we can't really get to large scale data marketplaces without something like blockchain. Note: Scott is much more skeptical given there are few examples he is aware of where blockchain is really working for trust and security - can you really track usage in someone else's systems? What about their security capabilities to not have a data breach? Can we actually track ethical use in data?Another aspect Mozhgan mentioned is that data consumers can only use data they purchase in ways allowed by the contract. Sarita Bakst mentioned this when talking about externally purchased data - data producers want to maximize monetization so data purchasers have to pay for each individual use case. So data producers want to track that consumers/purchasers are actually adhering to that part of the contract. There are a number of recent examples where data sellers will have wildly different prices for the data in PDF form versus an API. The API probably actually costs less to maintain but there's a strong correlation between consuming via API and getting a lot of value from the data consumed.When it comes to data consumer trust - can they actually trust the data? - Mozhgan believes we are seeing better ways of tracking data quality all the way up to source. That independent verification is crucial. If data consumers/purchasers understand the exact quality dimensions, that typically makes the data immensely more valuable. Stolen credit card numbers on the dark web go for pennies because you can't really trust the source for example.Mozhgan gave a really interesting example of where data marketplaces can take us. Utilities need to monitor trees and proactively trim them where possible so they don't disrupt powerlines or phone lines. But each utility typically does not have a great information set internally - often from a lack of the amount of data to actually be good at proactive tree trimming. So utilities are trying to get to a place where they can jointly share information with each other to improve their predictions for where to trim. However, a lack of a standard way to share data is really making it quite difficult to actually achieve the desired results. So how can we learn to quickly share information across organizations without a long and complicated process to do things like design a standard data model? Could a marketplace help?"Data ethics is a nightmare," even not related to data marketplaces according to Mozhgan. This is not just AI model ethics with bias and the like but there are often unethical ways of presenting the data. Then of course, there are many companies collecting and using data unethically. And we don't necessarily always want to remove all bias - it may have predictive power. But we need to focus more on the impact of our decisions on the input and output/impact side with data. And she believes we can use a lot of the guardrails we use around AI to ensure ethics in data marketplaces.Mozhgan recognized that ethics will always be a bit messy when sharing data outside the organization. One suggestion to prevent ethics issues is to only share the insights instead of the actual data used to generate the insights. Or you can share pseudo-anonymized data as well. But at the end of the day, ethics falls much more on data producers than most expect. You have a duty to not sell data that can be misused!For Mozhgan, there is too much of a focus on the value generated from data work instead of the actual return on investment. This happened in AI with massive hype and it's happening more in analytics recently - everyone needs to be data driven, right?! You need to create a business case and look at what the expected costs will be for data work. We don't have really easy paths to predicting exact value but we can get better at that and be realistic about expected costs. Quick tidbits:Knowledge graphs will be crucial to sharing data with other organizations and internally for data mesh.It's crucial to see organizations as living, breathing ecosystems. Design your organization and ways of working to be able to adapt.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Nov 13, 2022 • 14min

Weekly Episode Summaries and Programming Notes – Week of November 13, 2022

Nov 11, 2022 • 1h 12min

#153 Federated Data Governance Through Changing Minds and Hearts - Interview w/ Mariana Hebborn, PhD

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Mariana's LinkedIn: https://www.linkedin.com/in/mariana-hebborn-phd-118035117/In this episode, Scott interviewed Mariana Hebborn, Lead of Data Governance for the Healthcare Sector at Merck Group Germany (not Merck, the pharmaceutical company).Some key takeaways/thoughts from Mariana's point of view:It's crucial to answer why are you doing data governance. Is it for improving data quality? Better data security? Know what you are trying to achieve to best focus your efforts.Make it easy for people to understand how and why to share their knowledge with the rest of the organization. The people mindset really is the most important aspect of successful digital and/or data transformation.Most everyone knows we need to go to federated data governance but the big question is how. How can we do it safely? How can we evolve? It isn't a simple switch we can flip.To drive buy-in for moving from a centralized data governance approach, we need to show the benefits of federated - when done well - versus a monolithic approach.At the end of the day, governance is about conversation and missioning - why should you care about governance? What value will it drive for your organization? Answer those questions first.We need to find ways to organize closer to the source to capture far more domain knowledge when sharing data. Centralized teams just can't understand the context in a large and complex organization.?Controversial?: Most data access should be to packaged insights - the computational result - rather than raw data itself. Most people consuming information want the insights, not the raw data.We need to take learnings from operations and microservices so we drive to clear boundaries, clear responsibilities, and easy access to the information people need. That will prevent data silos and keep us agile.Who should have access to what data is far less cut-and-dry than we'd like. It is much more about what's in the data and the specific usage. So domains need to understand some clear rules but make the decisions because they understand the data itself far better."The best data owners and data stewards are found, not made."We need to get to a place mentally where data governance is so ingrained, there's never a question of if we should be doing it.Start with looking to better govern the data that is already generating good value - or is otherwise important - first. Don't try to govern all data at the start.If data consumers aren't sure if it's allowed and/or appropriate to use data in certain way, it is on their shoulders to ask. Data owners and stewards can only set rules that go so far. Otherwise, data owners and stewards cannot feel comfortable giving others access to their data.Everyone should be able to browse what data is available even if they cannot have access by default. You can find more use cases that data consumers wouldn't have thought were possible or available.Lack of knowledge of the law and regulations does not protect you from the law and regulations. Don't be naive. "Centralize the knowledge within the domain".For Mariana, when talking about data governance, the general industry consensus is that we need to get to federated governance, but the big question is how to actually do that. And governance needs and the pace of change are very different depending on the industry. Many industries are already adopting the federation mindset but are still struggling to do data governance well - it's either centrally managed or it's a bunch of silos. How can we get past that?And according to Mariana - and Scott - centralized data governance is a pragmatic approach. Until it isn't. At scale, centralized data governance is breaking in most - if not all - large organizations. So we need to look for ways to organize closer to the people with the knowledge about the data so they can share the domain-specific context far better than anyone else. But we still want to lean on governance experts to keep domains aligned with the greater organization. Mariana believes we can win over people by clearly comparing federated versus the monolithic approach - for domains and for the greater organization. Show them what data governance means for them, why it matters, and why it benefits them, instead of trying to show them exactly what to do.We have already figured out how to do cross domain information sharing on the operational plane in Mariana's view. That prevents the silos but is also not a centralized way of working. So, we need to figure out how to do the same with data for analytics, taking a lot of the same learnings from moving to microservices. We need clear boundaries and accountability - it is needlessly confusing when we don't know who is responsible for what. In data, we need to focus on getting people access to the information they need to better the organization. And Mariana knows the idea of central access and control feels like a good one - the central team knows governance best right? But, it just doesn't work well at scale.According to Mariana, much of doing federated governance well is about changing your mindsets. To get started with federating your governance, you need to find your data owners and data stewards. And per Mariana, "the best data owners and data stewards are found, not made." There needs to be people on the ground in the domain ready to clean up before federating governance to that domain to tidy things up as best as possible. And we want to move towards more/better governance as well. You can't go from no governance to governing everything. Start with what matters and what drives value.In many long-evolving industries or companies, people often ask if we should even do data governance in Mariana's experience. Really data driven companies don't ask that - we need to get to a place mentally where data governance is so ingrained, there's never a question of if we should be doing it. But we also can't boil the ocean. So start with looking to better govern the data that is already very important first, whether that is because they are generating value, are sensitive, are widely used, etc. And then figure out what you are trying to get from data governance - is it quality, better security around your data, something else? If you don't know, figure that out first.Mariana believes crisp policies about compliance are very helpful to lower the amount of effort people need to be in compliance. We can't have every domain team learn all applicable laws and regulations. Well-crafted policies mean there is less work interpreting what is and what is not allowed. Of course, this is far easier said than done.At the end of the day, governance is about conversation and missioning according to Mariana - why should you care about governance. It's very easy to fall into decentralized instead of federated governance. Decentralized is where you end up with data silos. And there are information silos to overcome as well - knowledge is typically trapped in people's heads instead of disseminated through the organization. For Mariana, when doing federated governance, it's best to centralize the knowledge within each domain. And bring the rules to the data. Make it easy for people to understand how and why to share their knowledge with the rest of the organization. The people mindset really is the most important part of a successful digital and/or data transformation.Quick tidbits:Most data access should be to packaged insights - the computational result - rather than raw data itself. Most people consuming information want the insights, not the raw data.We need to identify who are the people who are already working with data, who knows who should and who already has access to the data. Everyone should be able to browse what data is available even if they cannot have access by default. You can find more use cases that data consumers wouldn't have thought were possible or available.Accountability for data owners and stewards only extends so far. They need to own who should have access and what is proper use but if someone misuses data, that is on the person or team misusing the data, not the data owner or steward. Data consumers should ask if they aren't sure if something is allowable/appropriate.Lack of knowledge of the law and regulations does not protect you from the law and regulations. Don't be naive. Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app