Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Jan 16, 2023 • 1h 13min

#181 Learnings from BlaBlaCar's Early Data Mesh Journey: Positive Transformation for the People and the Organization - Interview w/ Kineret Kimhi

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Kineret's LinkedIn: https://www.linkedin.com/in/kineret-kimhi/Kineret's Blog Post 'Do’s and Don’ts of Data Mesh': https://medium.com/blablacar/dos-and-don-ts-of-data-mesh-e093f1662c2dIn this episode, Scott interviewed Kineret Kimhi, Analytics Lead at BlaBlaCar.Some key takeaways/thoughts from Kineret's point of view:!Interesting Decision!: BlaBlaCar reorganized their data organization but did not fully decentralize by embedding people into domains. Instead, they kept a central team but combined multiple functions into a squad around domains - a key domain might have a data engineer, data analyst, data scientist, and a software engineer.!Scott Mantra Too!: Sharing your experience - data mesh or otherwise - early and often with the broader data community means better and quicker feedback, not just internal experience. It's okay to be vulnerable about what didn't go well, you can get better info and help save others the same pain.?Crucial?: It's very important that when you split up your teams from functional data role teams, people keep in contact with functional role peers. If not, it can be very lonely as the only data engineer inside a domain. There is a significant turnover risk and a risk to not having scalable learning and knowledge transfer of data work if not handled well.Data mesh will lead to a lot of potential changes to people's ways of working, especially with each other. Don't shy away from that, people need to know you aren't forgetting they need career development and that you'll support them as they learn and get used to new ways of working.Communication really is the most important aspect of getting data mesh right. You need to get feedback and keep people informed, aligned, and in sync. Change is less painful if people are told why it is happening and are informed instead of discovering the results of change.Look to align as much of your data team at the start of a data mesh journey - get everyone involved in the plan as best as possible. But don't completely change your data team structure at the beginning, start small.On that note, starting small means there isn't a huge disruption to your ways of working and data org so a team/domain can get comfortable with new ways of working. You can get great feedback to make it easier/better for the domains that follow if you keep in close contact.When doing your data mesh PoC, look for simple use cases. Set your PoC up for success and focus on learning how to do data mesh rather than tackling the hardest challenges first.Reorganizing your data teams can be frustrating to non-data team stakeholders. You can't have everything drop to the wayside while you learn to do data mesh. There needs to be a balance between learning a new way of working and continuing at least some semblance of business as usual around data work so you don't cause major disruption to exec's ways of working.It's really easy to go wrong with governance early in a data mesh journey. Getting people on the same page and on the same tooling is crucial so there is a better shared understanding, e.g. lineage, observability, catalogue, etc.If you do not have key core aspects of data governance in place, get those in place before starting a data mesh journey or you will make it much harder on yourself than it should be.?Controversial?: Data documentation needs to get disrupted because it is still far too manual and too difficult for consumers to really understand without just experimenting with the data themselves.?Often Overlooked?: It's important to recognize data mesh isn't the right fit for every organization. Why are you looking to do data mesh? Does data mesh even address your data challenges?!Important!: Make certain to give your PoC the resources - including people - needed to succeed but also set it up to succeed. Give the space and limit the pressure on your PoC to really learn if data mesh is for you and if your organization is ready to do data mesh.As you get domain teams to start owning their data, it's not a switch to flip. It's a process, work with them to get them capable and don't ask them to do overly complex things as they learn. Crawl, walk, jog, run.Make the team participating in your PoC feel like the pioneers and the potential heroes. Make it as easy on them as is practically possible and try to keep as much load off them as possible. Look to make participating in the PoC as beneficial to them as possible and not a burden.Set up a strong governance process to prevent schema changes from causing unnecessary downstream pain. But that can't be on the centralized team to make the changes, that's a bottleneck and doesn't scale.Don't expect data mesh to suddenly solve all your data challenges :)Kineret started off the conversation saying she was previously running data engineering at BlaBlaCar but with their move to data mesh, that isn't really a necessary role anymore so now she is the Analytics Lead. This was part of their greater reorganization of their data org where they are organized around domains into squads instead of by functions like data engineering. So crucial domains might have a data engineer, software engineer, data analyst, and data scientist in one squad focused on their data. So BlaBlaCar has a central data team but each squad is essentially attached to a domain. They kept their chapters around functional roles to keep knowledge sharing high and promote more camaraderie between similar roles attached in a squad to the different domains. BlaBlaCar sees the value in sharing their experiences early and often with the general data community so they can take in external feedback and also help out others looking to do similar things with data. Scott note: If I had a nickel for every time I tried to preach this… :)On advice as to how to start a data mesh journey, Kineret was relatively insistent that you need to form a group of people to partner with on the transformation. No matter your title, you need to have people to lean on and get feedback from. You can't drive it simply by force of will. Some things BlaBlaCar did that she believes helped make their journey (more) successful were: getting broad alignment around their data mesh journey, including around planning. It wasn't just a small team of people, all the data team worked together. Second aligns well with Zhamak's advice too: start small. It allowed them to get people used to a new way of working instead of trying to shift the entire data approach of the organization at once. They focused on collecting feedback from everyone involved in the PoC so they could see how well it worked and so that future domains could replicate the successful parts and avoid the ways of working that didn't go as well.As part of their 3mo PoC, Kineret and team took in a LOT of feedback from that single team. The reorg made a number of the data squad attached to the domain feel lonely and disconnected from others in a similar functional role. Again, back to the chapters approach to keep people connected around their role functions. Kineret said she believes if they tried to move all the domains at once to data mesh, their journey would have failed and they likely wouldn't have kept their data people nearly as happy because they might not have implemented the chapters approach early enough.To keep your overall organization on board with your data mesh journey, it's important to think about how stakeholders interact with the data team and keep that stable while you are in transition according to Kineret. If key stakeholders across the organization have to go through an entirely new and different process with each domain, those stakeholders are not going to be happy. So plan ahead and communicate what changes and have things well documented, not just at the data product level but at the business process level. If only you could do data mesh as if it were a separate thing unto itself but it's part of the business strategy, you can't keep it in a bubble. For Kineret, there have been some data mesh transition pain points especially around people moving into different roles or day-to-day responsibilities but communication is the key to keeping everyone aligned and limiting the unnecessary pain that often comes with change. If someone is used to talking with Alice in the central team about challenges with data and then Alice is suddenly in another domain, there is some frustration and concern by that data consumer as to who to talk to and how can they get to a good relationship with their new contact. But a key goal of data mesh is to make data consumers' overall experience better. So while it might be a bit challenging as things change, keeping data consumers informed of change and making sure there are low friction processes to get what they need are crucial.Kineret believes data documentation as a whole needs to be disrupted. We generally have some necessary pieces but it's still overly manual to properly document and consumers still generally can't really understand the data without digging into it themselves. The documentation generally isn't capable of doing what's necessary to get someone up to speed on a data product. Even though her team is doing great, it's still a challenge to find the right mix of important to in-depth. And it's still very manual work to create the documentation. Scott note: very much true. Data documentation is still an incredibly difficult task to get right and is probably far more tedious than it should be.When asked for some general data mesh getting started advice, Kineret had some beginning questions instead. What does your general data governance look like? If it's not robust, you should look to set that up before decentralizing. You'll save yourself a lot of unnecessary pain. Second, what is your buy-in for data mesh like and what is the reason for thinking data mesh is the right choice to solve your challenges?* Data mesh isn't right for every organization. Third, are you really ready to do your PoC and give it the resources necessary to succeed? Are you setting your PoC up to succeed by not putting too much pressure or trying to tackle too hard of a problem? Lastly, can you find a PoC use case that is relatively contained so this doesn't have too much outside influence and too many stakeholders? Can you clear the space to make it possible to succeed?*Scott note: so much this; if centralization isn't you bottleneck, decentralizing is far more likely to cause more issues. Be realistic about what data mesh can change and what it can't. Don't use an excavator to dig a 3 inch deep trench in your garden…A non-standard approach Kineret and team took was by separating ingestion and putting that on the data platform team. So the data domain teams could focus on the cleaning and transforming data instead of setting up the extracting from databases or other data stores. This was part of BlaBlaCar's capability building and data ownership transition strategy. They didn't ask or expect the world from the data domains as they were learning. Find relatively simple things for them to do instead of the most complex data engineering tasks. Crawl, walk, jog, run.Heading into the end of the conversation, Kineret really emphasized how important it is to get the people aspects of something like data mesh right. Make sure people can feel seen and heard, keep people informed, keep people in touch with those doing the same functional roles so your technical folks don't get too lonely, etc. Really make sure you focus much more than most technical people probably want to - the tech is cool! - on making this a positive transformative experience for the people in your organization, not just the organization itself.Quick tidbits:Governance can be really difficult but very crucial early in a journey. You want everyone on the same page relative to ways of working but also on the shared governance tooling, e.g. observability, cataloging, and lineage.Make sure to treat your initial PoC domain teams like pioneers and give them the support and guidance necessary. Keep undue burden off them as best as possible, keep in constant contact for feedback, and look to make it as beneficial as possible to them. Celebrate the PoC domain team because they went and did the big, kinda scary thing that could transform your organization.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 15, 2023 • 31min

Weekly Episode Summaries and Programming Notes – Week of January 15, 2023

Jan 13, 2023 • 1h 13min

#180 Shared Understanding Leads to Data Value That's Outstanding - Interview w/ Chris Dove

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Chris' LinkedIn: https://www.linkedin.com/in/charles-dove-b4715723/In this episode, Scott interviewed Chris (Charles) Dove, Data Architect at Endava. To be clear, he was only representing his own views.Some key takeaways/thoughts from Chris' point of view:Data used by only one use case in one way is not how you make money by leveraging data, it's too expensive. Set yourself up to reuse data and make sure the organization is aware of what data is available.?Controversial?: Tooling around data, especially metadata, has gotten better. But is it good yet? There are still some major fundamental gaps that seem like basic blocking and tackling around sharing data.Data isn't the point, it's merely a vehicle for exchanging information.Far too often there is an implicit understanding of a taxonomy/shared terms in different business units that is actually incorrect which leads to misunderstandings and mismatched data being treated as the same. But it's not easy to make all aspects of all parts of data explicit and easily understandable, we have to invest and find good ways to do that.!Incrementally Important!: Business people in domains often don't understand their own data because it's embedded in an application. So they only experience their data in a context that is already framed for them by the application. So they don't think about someone else not understanding the data inherently when those others _aren't_ experiencing it through the same application.Getting to a 'good enough' level of documentation is crucial to prevent misuse of data based on misunderstandings. But every organization has to figure out what is good enough and how to get there, what level of documentation is required, there isn't a blueprint.A constant challenge in data is implicit assumptions by producers around 'they work here so they know this' instead of documenting nuance. This leads to misunderstandings based on incomplete documentation.Beware the TLA - three letter acronym - in data documentation. It might have a lot of embedded context for those with domain knowledge but it's not helpful for those without the understanding. Focus on explaining the concepts at a level an outsider can understand.?Controversial?: The point of data literacy isn't to teach everyone technical skills, it's to get to an understanding of their own data and how to share the context of their data so others can get a decent understanding of their data. If a good use case emerges, we don't need everyone to be able to create and maintain a data product but we need an organizational understanding of what data might be available and what it means.Tribal knowledge is a double edged sword. It's great your organization has that knowledge but it's a massive risk point. Get it out of people's heads as much as possible. Scott note: find low friction ways like interviewing people to extract their context. Like this podcast...Most organizations don't have a great data documentation strategy or practice. It's better to get going with learning how to share information about your data than try to make your data documentation or documentation strategy perfect upfront. Get something in place and recognize technical debt but something is better than nothing if it's not the end state.Trapped metadata - where tools try to enforce a closed system instead of easily offering up critical information for ingestion by other systems - is a persistent problem that doesn't seem to be getting better. It's even worse when you write custom code to do transformations because most people aren't creating the necessary metadata at all.We need vendors to be bought in that publishing their metadata is the right move and head towards metadata standards to make creating a more complete picture via metadata easy/feasible. Some vendors are moving in that direction, notably Atlan and data.world.Truly getting people to change the way they think and feel - not just do work - is an incredibly difficult challenge that most companies don't ever really address. Make the change in ways of working a value-add to actually change hearts and minds.To do data mesh well, we need to figure out scalable and highly effective ways of communicating changes - just offering data product versions won't cut it. To do that, data producers need to know how their data is used as well.Companies need to really see and understand the business benefit from their data before they are likely to change their ways of working around data. That can be a chicken-and-egg issue though.Chris started with his view that while tooling is getting better in general, most tools are still very lacking in how their metadata plays into the greater organizational view of data which means we can't do some pretty basic things. Or at least there aren't comprehensive tools that make easy sharing of context easy across teams because of many metadata incompatibilities/challenges. So we need to get to a way to show the semantic context-related metadata as well as the transformation metadata in one place that is also understandable by the business users. A hard order to be sure but fundamental to enabling the vision of companies being actually data-driven.There is a very common problem in organizations that comes from an implicit taxonomy and homonym problem according to Chris. The classic example is the definition of a customer but it goes far deeper than that. Some bit of data often has a very specific meaning in a source system and/or domain but then a different business unit looks at it with their own interpretation of what it means and misses the nuance, the differences. So do you have an enterprise taxonomy or do you try to document the exact meaning differences or do you not let people have access to data in case they misunderstand? Not as easy of a choice as many would like to prevent these misinterpretations or misaligned data mixing.An interesting and very crucial nuance Chris mentioned about data sharing: the business people in a domain are often consuming their own data through a different lens. The data is embedded into an application for them so the interface makes much of the nuance, the meaning explicit. But that meaning isn't included in the data by default - the column title in the table doesn't have that meaning. So those business people - the data producers - often struggle to understand why people are confused or don't get the nuance. So it's important to make sure the data producing domain understands the interface others use to consume their data. Scott note: this is benefit I hadn't considered of having domains consume from their own data products. If nuances about the data aren't explicit, if the documentation isn't good enough, will they get confused about their own data? Will that force them to do better in building their data products?Chris hit on a common problem many are having in data mesh - and data in general: what depth of documentation and explanation is necessary for data to be useful and not misused? People automatically assume some level of knowledge of the domain simply because others are in the organization. 'You work here so you understand my domain' type of attitude. So we need to make sure people can at least understand what they don't know and give them a way to get up to speed on what they need to know about a domain. Self-service can be a recipe for disaster if people can't understand when they are missing the necessary understanding/meaning.While domain-specific acronyms can have a lot of embedded information in them for people with knowledge of a domain, they are often a major hindrance to those trying to learn about the domain in Chris' experience. Instead of focusing on exactly what your team calls everything, focus on the concepts and why they matter. As Shakespeare said "what's in a name", don't be enamored with sharing context via domain specific language. Referring back to Vlad Khononov's DDD episode, the internal domain language is the ubiquitous language but the published language is what is used to share with the rest of the organization. Focus on that published language - how can things be understood easily by those outside the domain?For Chris, the point and meaning of data literacy isn't what most think - it's about getting people to understand what data they have and the general meaning/context so it can be communicated with the rest of the organization. It's understanding how data can be used and shared, not the exact technical aspects. It's about getting to a capability to share context and understand other's context around data without getting overly technical. When there is a use case that emerges, not every single person in the company needs to be able to create and maintain a data product. Basically, the concepts matter far more than everyone learning SQL.In Chris' view, tribal knowledge is a very dangerous place to be. You have amazing and extremely valuable knowledge but it's trapped in people's heads. What happens if they leave? We all know about tribal knowledge but it's especially important in data because again the context and nuance, not just the column name, matters :) So extract that valuable tribal information, get it into a consumable format for the entire organization. It frees up the time of your most knowledgeable people too as they aren't answering questions all the time - extract once but leveraged by many :)Good documentation, good knowledge sharing isn't about anticipating every challenge and writing out the fix or entirely preventing it according to Chris - that's not feasible. It's okay to get things into a knowledge base rather than the perfect metadata tool at first. You want to improve but if you are waiting for the perfect solution, you won't ever move forward with your data documentation. So get something out, put it in front of others, ask for feedback, and improve. And as stated, documentation doesn't have to answer all questions - it's something to make sure people generally understand a certain set of data and if they have a deeper question, they have a clear question escalation path for who to ask.It's easy to get lost in data by focusing on 'data as the point' in Chris' experience. Data is merely a vehicle for exchanging information. But there are lots of interesting technical challenges in dealing with data so data people often lose the plot. Without the context around the data, it's useless too, so we have to focus on delivering it as one packaged unit. Scott note: this is what Zhamak keeps referring to as a data product container or a unit of data - that it isn't merely the 1s and 0s but the context, the user experience, the lineage, etc. wrapped in one package so it is usable as is.For Chris, the biggest issue right now in data, especially for something like data mesh, is the trapped metadata problem. It's something Scott has mentioned repeatedly: most tools that touch your data in some way at best generate metadata that is trapped in that tool or it's extremely difficult to extract and integrate that metadata into other tools. And when people write custom code to do transformations, they often don't generate the metadata at all! So trying to get the full necessary picture of what's happening around our data is extremely time-consuming and difficult.Chris called out the need for specification around metadata so we can at least bring it all into one place. Only a few vendors have moved to making it possible to even extract most of the metadata they create - he noted Atlan and data.world - but hopefully more vendors are pushed - or dragged - into doing the same. OpenMetadata or other early projects may provide a good way to start developing some standards for how things are described, shared, and/or stored. But again, trapped metadata is a lock-in pain that vendors are seemingly unwilling to let go of unless their hands are forced.To really move forward with how we all approach data - as an industry and at the organizational level - we need to change the way people think and feel about data according to Chris. But change forced upon people only _might_ change their way of working and usually doesn't. So we have to focus on changing hearts and minds or the behavioral changes won't actually net the necessary care changes to the ways of working and understanding we need in data. Easier said - to change hearts and minds - than done but actually changing how we work requires empathy not mandates.Chris finished on two points: the first is to really change the way an organization does data, they have to understand how data fits into their overall strategy and how treating it as a product impacts the work. Something may be valuable now but that value might fade. It's okay - and even very healthy - to end of life any data work that is no longer valuable. And the second point is that reuse is really key to generating strong business benefit from data. The cost of getting data to a point you can leverage it is typically high, look to make it reusable and find valuable ways of reuse as much as possible.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 10, 2023 • 19min

#179 Reliability Engineering for Data and Data Mesh - Mesh Musings 40

Jan 9, 2023 • 1h 7min

#178 Data Modeling in Data Mesh Panel - w/ Juha Korpela, Kent Graziano, and Veronika Durgin

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Juha Korpela (Chief Product Officer at Ellie Technologies) facilitated this panel on data modeling in data mesh with Veronika Durgin (Head of Data at Saks) and Kent Graziano (The Data Warrior, former Chief Technical Evangelist at Snowflake). This panel was hosted by the Data Mesh Learning Community in partnership with Data Mesh Radio. Veronika's Links:Veronika's LinkedIn: https://www.linkedin.com/in/vdurgin/Data Vault North America User Group: https://www.meetup.com/dvnaug/Kent's Links:Kent's LinkedIn: https://www.linkedin.com/in/kentgraziano/Kent's Website: https://kentgraziano.com/Kent's Twitter: https://twitter.com/KentGrazianoData Vault Alliance: https://datavaultalliance.com/Juha's Links:Juha's LinkedIn: https://www.linkedin.com/in/jkorpela/Ellie Technologies' website: https://www.ellie.ai/This write-up is from Scott Hirleman's point of view:As someone without a ton of depth in the data modeling concepts, here are some of my key takeaways that should be taken with a grain of salt :) I decided not to write up everyone's opinions but more what are my takeaways:Start from the business concepts instead of the technical (a data mesh theme).Focus on enabling people to do the data modeling instead of trying to do it centralized but do have a centralized understanding.It's crucial to not only consider data modeling at a data product level, that's the route to data silos.Communication is the most important data modeling skill.Share early and often to get feedback and work together on that fast feedback cycle to quickly iterate.Limit your blast radius, only bite off what you can chew and limit what can go sideways early as you learn - do not cause lots of downstream damage.If you don't stay connected to each other and communicate well, you'll likely have MDM (master data management) style nightmares in data mesh.Centrally define the standards and rules and have a clear way to get help and settle questions/disputes.Starting data modeling from technical integration is about what can we do right now. We should focus on what we need to do to drive business value, not what is possible based on the existing solution.In data modeling, far too often people don't look at what data consumers want. User requirements, business requirements, and technical requirements all must be met.Not everyone needs to be a data modeler but everyone should understand how information is communicated internally via a data model.Alla Hale method (episode #122): show up to every meeting with something to discuss. Don't show up to the first meeting with it built but constantly show something to get feedback on.Shared understanding is crucial and can ONLY be reached via strong communication. Stop trying to shy away from communication.Make the implicit explicit - "what do you mean by that?" is one of your most crucial tools to doing data modeling well.Data modeling in data mesh will probably be far more similar than dissimilar to data modeling in a more centralized world. The focus on the business concepts is crucial. Far too often we try to start from the technical instead of what are we trying to achieve and it's crucial to not fall into that trap. Getting the technical aspects for interoperability wrong can be a pain but if things work together technically but not at the business level, that's a lot of sound and fury signifying nothing - essentially that's a lot of cost for work and compute that doesn't lead to actual business value.One thing I'll note is every one of the guests is a Data Vault proponent. I'm not sold that it's the right way in the long run for data mesh - I feel like we need to evolve data modeling concepts for a more distributed, federated organizational approach. But from what they said, Data Vault does sound like a very solid base to start from - start from the business concepts first and focus on what you are actually trying to accomplish. Data modeling for the sake of data modeling is not something anyone should want to do.As with just about everything else in data mesh, data modeling should be about limiting your blast radius of potential negative impacts as you get to fast initial and incremental feedback. Get to that iteration, take on things that matter but don't make it a big bang. Fail fast and all that :) This is not about taking requirements and going off to your own world at the domain level. It's even more crucial to have overall communication/cohesion as we enable more and more people/domains to own their data.The most important aspect for developing lasting interoperability via data modeling is the business concepts. It's not that hard to do the technical interoperability once you figure out how things should work together. Starting from the technical feels easier but is a recipe for losing the value of the bigger picture. Typically, technical-based implementations are less extensible because the technology decisions are embedded into the solution instead of an enabling factor. Veronika said something like "focus on the words and meanings and not the data types." She also said that starting with technical integration focuses far too much on what can be done with the existing implementations instead of what do we need to drive value and how can we improve the existing implementations to create more value.As with most things in data and software engineering, it's okay to build your data model in an opinionated way but maintaining flexibility, especially as you are doing your initial development, is crucial. Major change can be quite costly - especially if you have to change the entire foundation of what you are doing. Kent railed against the need for rework. Build your data model and subsequent data products so providing another view or angle isn't nearly as difficult and requires only the work to do that, rather than changing everything else you've done to also accomplish the new data view - in data mesh, this would often be a different API or a new table sharing the data for a different use/perspective. Again, look to prevent rework and ensure flexibility.A massive concern with data mesh is data silos, the worry that if you have a bunch of domains doing their work separated and not in communication, nothing will interoperate. So you probably do need some kind of centralized group - whether that is their main role or part of their other responsibilities - helping domains do their work in the context of the greater organization. Note, that is what loose coupling from microservices means. Fully decentralized would be no connections versus things work together but can scale independently - that is decoupling. Having people who are there to help is the key to federated instead of fend-for-yourself so data architects are a crucial aspect of data modeling in data mesh.While there is no centralized data model in data mesh - they aren't flexible and mean you lose a ton of context from trying to force things to comply with that model - there obviously can be centralized guidance and direction, a standard set of data models, etc. Think about a well-functioning federated government - maybe not the US… There are people doing work in the centralized function but it's about enabling those at the more local level to do things the right way. Juha quoted someone with something like "governance is not about leading people to do things right, it's about setting them up to do things the right way". That centralized team can't know what's right for specific situations - because they lack the localized context - but they can specialize in enabling doing things the right way. Kent claimed there is an enterprise data model and that can quickly go the wrong direction but if you interpret it well - that there are clear relationships across the business that are crucial to model well in your data, that are fundamental business truths you should reflect in your data - it can mean much less learning of deep domain specific context because you understand how domains fit the organization.A number of people believe data modeling must be all about one view or perspective to rule them all. That is where data mesh fundamentally pushes back. You can have one view you agree on as an organization - such as revenue - but others should still be free to publish something that is similar in meaning but from another view. Much like in Domain Driven Design (listen to Vlad's episode for more, #171), there should be a 'language' (broad definition, think interface and terminology) of the domain to maximize the context of information shared in the domain and a separate 'language' used to communicate to the rest of the organization. That way we can still maximize context for business value locally but communicate globally in order to also maximize global data interoperability, which is a crucial organization-wide business value driver.Kent mentioned another worry of data mesh that is often closely aligned with the data silo worry: master data management-related nightmares. While we absolutely have to reinvent MDM for data mesh - look for a few panels on that in the near future? - it's pretty clear it's bad to potentially have 10 different definitions that might filter to someone who doesn't understand the nuance and differences. Especially if that exec asks a simple-seeming business question and gets 5 different answers. Data trust is gone. So we have to be clear in tackling that problem and strongly communicating. Maybe not mastering data but mastering ways of answering typical questions?Overall, I think you will learn a ton just like I (Scott) did :)Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 8, 2023 • 29min

#175 Ethical Data Usage - Informing and Educating Consumers - Interview w/ Esther Tham

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Esther's LinkedIn: https://www.linkedin.com/in/esthertham/In this episode, Scott interviewed Esther Tham, Experience Designer at Thoughtworks. Scott reached out to talk about data ethics based on a post Esther made on LinkedIn.Some key takeaways/thoughts from Esther's point of view:When designing your UX (user experience), companies should aim for as little friction as possible when signing up or transacting. For an ethical company, that means collecting as little information as possible to still maximize value of the service to the user.Companies: if you don't need it, don't collect it! It isn't ethical but also it increases your attack surface for a data leak and potentially lowers consumer trust.We don't have the proof points of many companies doing the right thing and disclosing potential issues of sharing information with them in an understandable way. But that would likely increase consumer trust. Is that trust worth more than the hassle to a company? We need companies willing to try being more ethical to really know but it's a cost with a very uncertain upside so not too likely.People need to learn that their personal data has value - and risk - associated with it. Don't give it over without thinking about how it might be used/misused. But most people are nowhere near that thought process yet. Right now, most people are only worried at most about getting scammed, not should this company have my data and how might they misuse it.Ethics isn't just about collection or even usage, protection is also crucial. If you can't protect sensitive information, you shouldn't be collecting it.How can we encourage the general population to really care about ethical collection and use of their data? Is it just better explanation of how it's used? With greater understanding, will most people actually care?The question of who is responsible for ethical data collection is an interesting one. On the one hand, companies should be behaving ethically. On the other hand, they often don't so how much responsibility to protect sensitive information is on the consumer not handing it over in the first place?A designer's role is to advocate and build for the user. But we still don't really know exactly what most users want when it comes to data ethics. Do they really care about ethics around their data or are they willing to trade their data for services? Is it about more education/communication or do users genuinely not care? We need brave companies willing to test.How do we press companies to be more ethical in the data they collect, how they protect data, and how they use data? Have many companies suffered damage - reputational or otherwise - from ethics breaches? We clearly can't trust every organization...On the flip side, how do companies that are actually doing the right things ethically communicate that to consumers? Is there a value to consumers where they will seek out companies with high ethics? Is the cost of behaving ethically worth it, does it result in a tangible benefit? We assume there is an additional cost to behaving ethically too so there needs to be an upside for companies to consider it.It's easy for consumers to have a false sense of security online relative to their data. While identify theft and similar issues are on the rise, companies are still asking for - and consumers are regularly freely giving - sensitive PII.Very few people really think about potential misuse of data we give to private companies, often with little explanation by those companies of what they will use our data for. Can we really expect companies to fully explain their projected use of data when that might simply confuse more people? We can press them to do it but likely not expect them to willingly.However, when companies do try to explain their use of data, does anyone read it? Are EULAs actually useful? Do we need something that is in addition to a EULA to explain how data will be used and what will be collected?Most people seem to not really be all that concerned with the data they share until it seems it was used improperly, especially if a company sold their data to a partner or some scammer got ahold of it. And most don't expect a scam to happen to them so they only look to react after the data is already out there. Can we change their approach and view?Esther started the conversation with her background of growing up in Singapore, one of the safest large cities in the world. But that kind of environment can also lull people into a false sense of security when transacting online. Even the government was not really thinking about misuse as birth certificates, passports, and national ID numbers were all the exact same number until 2006. And until 2016, companies could ask for your full national ID number and many people gave it out without a second thought to get access to services.The Singapore government has pretty stringent requirements to only collect information for legitimate purposes and store it securely. The few incidents of people in their government accessing the private information of others has led to strong repercussions. Basically, it's not like many of the other countries out there with potentially strong laws but little to no enforcement. But when it comes to private companies, even in Singapore a number are still asking for a significant amount of data without giving clear justification or expected usage.For Esther, creating a low friction user experience is a bit of a double-edged sword sometimes relative to ethical collection. Do you store someone's credit card information to make transacting easier next time? Do you try to collect as much information as possible to show them relevant ads? Let's be honest though, that second one is about selling ads not a 'better ad experience'… So, how do companies effectively balance how much information to collect in order to provide a great experience but not too much where you are getting things you don't need?A fantastic point Esther brought up is that data ethics isn't only collection and usage, though that's often where we focus. Data protection, especially around sensitive information, is a major ethics challenge/question. If you can't protect the data, should you really be collecting it? If you are in possession of sensitive data, what level of duty do you have to protect it? Are you salting and hashing your passwords and other sensitive information?Again, Esther believes we have to think about who is the burden on to protect information. If a company collects it, presumably the burden should be on them. But if people willingly give up very sensitive information to companies, what level of responsibility do we place on people to be informed and smart about giving their data away? Who is the 'shame on' for the second time something happens? How do consumers protect themselves from companies like the US credit bureaus that collect information without consent? Especially when they haven't been ethical in their level of data protection.Another interesting point Esther raised is how do companies properly explain what data they will collect and how will they use it. Be honest, show of hands, how many actually read through most of the EULAs you agree to? Is that because they are insanely tedious or because we genuinely don't really care or have resigned ourselves to data being misused anyway? Is that just a friction point in the onboarding experience?In Esther's view, most people seem to not really be all that concerned with the data they share until it seems it was used improperly, especially if a company sold their data to a partner or some scammer got ahold of it. And most don't expect a scam to happen to them. So consumers need a better way of understanding their information attack surface and to not give information as freely but that's mostly on companies to not ask for it. And if they can profit off it, can we really expect companies to stop asking? Is this a chicken and egg scenario where neither side is really going to move first?If a company does really do a good job of disclosing potential risk of handing over data and exactly how they are going to use it, that presumably increases customer trust according to Esther. Unless people just don't want anyone to use their data but they are happier when they don't have to think about it. And then, for a company, is that increased trust worth the hassle or even the potential friction and scaring certain users? It might be the 'right' thing to do ethically, but how many companies are _really_ focused on acting ethically? So we aren't sure if this is a high cost for companies and we aren't really sure it leads to anything all that positive and we aren't sure consumers really care… so why would a company do it unless they have to or feel the need to act ethically? Are any companies brave enough to test it out?Esther wrapped the conversation up with a call to action: understand that your data has value and consider if giving away your data is worth the value a company is providing you. And how could it be misused. That doesn't mean be paranoid but also, don't give out information quite as easily.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 1, 2023 • 15min

Weekly Episode Summaries and Programming Notes – Week of January 1, 2023

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app