Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Jul 28, 2023 • 22min

#243 Zhamak's Corner 26 - The Fundamental Data Need: Autonomy with Interconnectivity

Takeaways:It's important to understand that we need enablement to do data mesh well. That is enablement through technology and enablement through organizational approaches/behavior changes. Doing only one will likely not work."...they need to move fast, they cannot be bogged down by centralization of any kind, organization or technology." Scott note: I will say we discuss later the need for centrally provided enablers but central bottlenecks are the speed and flexibility killer, look to prevent and remove them where possible.People want to simply produce data as a normalized process of doing their job and make it consumable for the rest of the organization. How can we enable that? Why is it so tough? How do we make it interoperable - and more importantly interconnectable - too?Right now, the missing core component to do data mesh well is an easy ability to create and manage data products. Everyone is having to cobble things together and then trying to layer on the observability, the access control, the interconnectivity, etc. But it's built on a shaky foundation.Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jul 24, 2023 • 1h 18min

#242 Making Data Accessible Makes Your Data Work Successful - More on PayPal's Data Mesh Journey - Interview w/ Kim Thies

Please Rate and Review us on your podcast app of choice!Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Kim's LinkedIn: https://www.linkedin.com/in/vtkthies/Gemba Walk explanation #1: https://kanbantool.com/kanban-guide/gemba-walkGemba Walk explanation #2: https://safetyculture.com/topics/gemba-walk/PayPal Data Contract Template OSS: https://github.com/paypal/data-contract-template/tree/main/docsStart with why -- how great leaders inspire action | Simon Sinek | TEDxPugetSound: https://www.youtube.com/watch?v=u4ZoJKF_VuAIn this episode, Scott interviewed Kim Thies, at time of recording a Leader on the Enterprise Data Team at PayPal and now SVP, Client Innovation & Data Solutions at ProfitOptics. To be clear, she was only representing her own views on the episode.Some key takeaways/thoughts from Kim's point of view:When talking about data mesh to execs, it's helpful to go back to basics: "these are the four main principles, and this is what we've built and why." Scott note: I recommend you slightly alter the phrasing, especially around "Federated Computational Governance" ;)Look to Simon Sinek and "Start with the Why". Always investigate the why for the other party. What would be enticing to your business execs to lean in on data mesh? But data mesh for the sake of data mesh is not going to win over the business.The communication and relationship building aspects of data work are often overlooked and will serve you better than just about any architectural or technology decision. Build the relationships so your great data work will address actual business challenges and be leveraged!?Controversial?: Similarly, learn 'the art of the conversation' so you can extract from people their needs/wants and then see how you can help them meet those. What isn't going well where you can help is a great question to find new use cases where the stakeholder will be engaged - you're directly addressing a business challenge for them.It's probably better to start from needing a solution and data mesh is a good fit rather than using data mesh as a hammer and looking for nails. You shouldn't want to do - or at least shouldn't propose doing - data mesh for the sake of doing data mesh.If you have a clear business use case, it's much easier to get people engaged and keep them engaged/involved - around data mesh or data work in general. Look to a tangible benefit - e.g. cost savings is a pretty easy first use case to go after.?Controversial?: Data teams - especially data engineers - need to spend more time "experiencing" how customers/users actually use data to deliver on their business objectives. It will lead to better outcomes and better relationships with the business leaders/teams.Leverage the Gemba Walk philosophy: walk the 'factory floor' and talk to people far more often. Ask them how they get their work done. It doesn’t need to be overly formal, just collect information to help others do their job better with data.You don't get to rest once you've gotten initial approval. Execs' attention will not last, they will start to focus on other challenges. Keep pointing to the business challenges you are addressing - not the data work itself - to stay relevant and near top-of-mind.It's not as if every aspect of your business starts doing a data mesh approach when you start your journey. There will be compromises and other parts of the business will likely choose other approaches. That's okay and normal. Build your momentum and successes around data mesh but accept it won't be right for everyone, especially at first.Data mesh will be received differently by every person or at least every persona. Each of the pillars might resonate differently. So be ready when speaking to focus on the aspect that is getting them to lean in. You need to balance the four pillars in your journey but not every conversation :)There is probably far more transformation needed in your data practices than you expect. Even after hearing that, there is still probably far more than you expect. Processes and hearts + minds especially.In the current business environment - likely headed for recession - you might be able to get people bought in on data mesh simply for time-savings for the data team. In downturns, cost cutting becomes far more attractive.Once you get your mesh journey going and you have some interesting capabilities to offer, it's important to go out and find additional use cases. Even if you're proving a lot of value, people still probably need a little convincing or at least some additional understanding of what you're doing, they won't all come to you.To speak the same language as your business partners "you have to listen first." It's pretty easy to assume you mean the same thing but even foundational phrases like data product and data contract often have completely different meanings for people within the same organization.It's incredibly easy to overlook user experience in data. Don't fall into that trap! Scott note: we did a data user experience panel if you want to dig deeper, episode #190.?Controversial?: Domain ownership is probably the most important aspect of data mesh because it puts the data back in the hands of the people who really know it best. Context loss is such a prevalent problem in data and data mesh solves for that quite well.Kim started off with a bit about the PayPal journey to data mesh. They weren't looking to do data mesh, they had a specific business problem of disparate data sets across multiple domains that needed to be combined with decent governance and observability. It just so happened that data mesh was a great fit, so they went the data mesh route. And there actually was an engineering-led conversation of the engineering team looking for a use case to use data mesh but that didn't end up being the data mesh initiative that went to production. There was also a bit less scrutiny on a business-led journey because it was within a larger line of business/domain rather than it was the core data team strategy.One thing Kim believes is that data teams, especially data engineers, do not spend enough time really understanding and "experiencing" how customers use data. They should go and pair closer to the business teams to deliver better solutions. That will lead to better outcomes but also better relationships. This is that aspect of data UX that is often overlooked. People want something to make their jobs easier or make them perform better, not just data. So how does data actually intersect with that?To actually do that 'experiencing', Kim and team leveraged the Gemba Walk philosophy (see links above for a deeper dive). It’s a Japanese concept of going and 'walking the factory floor' to collect information. You don't need to do super formal interviews, meet people where they are and ask them questions about their day-to-day. Get a sense of what really matters and how they do work. If you just ask people what they want, they might give you the Henry Ford 'faster horse' answer versus you discovering their points of friction.Kim discussed the challenges of not just keeping but retaining buy-in and attention to your data mesh implementation - or really any data work. Once you get the approval, there are probably things that are more top of mind for your business partners, especially compared to the specific data work parts. So keep circling back to the business challenges you are addressing in conversations to stay near top of mind. Kim and team accomplished that by getting to a really rapid, tangible proof of concept. They quickly had something to show from the work that made it clear this could work at scale.Starting from entirely inside one business unit meant Kim and team had a lot of autonomy. As soon as they were brought more into the central data team, things changed and there was much more focus on communication and sharing - and compromising too - especially around technology choices and approaches. Many teams were taking different approaches to similar challenges with the same technology so how do you get to best practices?One thing that really stuck out to Kim about data mesh and driving buy-in is that different pillars resonate for different personas. Data scientists or even data engineers embedded into a domain often love the self-serve aspect. Not even for consumption but being able to actually self-serve their own data production needs. They aren't afraid of the ownership principle of data mesh because they can own their own timelines and not get stuck in centralized bottlenecks. They might not even realize they are struggling pre data mesh because the central bottleneck/friction is so ingrained to their way of working, you can really make things far better for them but they wouldn't have considered asking.Kim believes you should be prepared to market your mesh capabilities once you have them up and running. That doesn't have to be a sales-y approach but you want to go and find additional challenges people couldn't solve but now could with mesh to gain more momentum, converts, and funding. Learning the art of the conversation is crucial. Is there a way that you can advance their business goals, address their challenges? If yes, they are more likely to lean in and stay bought-in to the use case. Don't be afraid of a little sales and marketing tactics to get to better business outcomes for all.It's very easy to use the same term and mean something different. Just look at all the what is data mesh or what is a data product content out there. So Kim believes when talking to your internal partners, start from listening first so you understand how they are using different terms. It's easy to jump to solutioning instead of understanding but your solutions will be lacking without the understanding :)It's pretty easy to lose track of the customer experience in data mesh in Kim's view. There are so many things to focus on and honestly, UX design in data hasn't really been a thing. As an industry, we haven't talked about UX design for the platform or the data product much before data mesh started to force the conversation. If you have all the capabilities in the world but a bad user experience, are people really going to use what you've built?Circling back to communication, Kim talked about the challenges of communicating with execs and getting bogged down in the work done. She said it's important to start from baseline communication around data mesh: "these are the four main principles, and this is what we've built and why." If you start there, then people can really start to connect the work to what might be beneficial for them. And as she stated earlier, still look to start from the listening and understanding aspects first and then always start from the why. Scott note: see the YouTube link above.In wrapping up, Kim talked about the through-lines of the conversation. Establish and build relationships, not just data products. Learning the actual problems/challenges is key. Talk to people where they are before you try to build something for them. Understand what's actually going on and where their friction points actually are. And then also look for scalable business cases - the best way to discover those is by establishing the relationships :)Learn more about Data Mesh Understanding: https://datameshunderstanding.com/aboutData Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jul 23, 2023 • 16min

Weekly Episode Summaries and Programming Notes – Week of July 23, 2023

10 snips

Jul 21, 2023 • 18min

#241 Data Product Success Metrics - A Kinda Deep Dive - Mesh Musings 51

Key summary points:At the start, it's more important to start measuring than it is to measure the right things. Do NOT let analysis paralysis hold you back.Similarly, your success metric measurement framework will probably suck to start. Oh well, get to measuring.Create a framework and tooling/platform capabilities - where necessary/useful - to make measuring and reporting against success metrics simple. That framework should be about defining the metrics and especially how to measure, not what success looks like for individual data products.Use fitness functionsGood metrics to consider in order of usefulness: user satisfaction, user value, data quality, time to business decision, delivery to expectations, time to update (can be squishy), and usagePlease Rate and Review us on your podcast app of choice!Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jul 17, 2023 • 60min

#240 Driving to Better Healthcare Patient Outcomes Through Data - Interview w/ Smriti Kirubanandan

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Smriti's LinkedIn: https://www.linkedin.com/in/smritikirubanandan/Smriti's HLTH Forward Podcast: https://hlthforward.buzzsprout.com/In this episode, Scott interviewed Smriti Kirubanandan, a Healthcare and Public Health Data Expert at a large consulting firm. To be clear, she was only representing her own views on the episode. Much of the challenges and opportunities discussed in this episode are more on the US side because of the not-so-well-functioning healthcare system there.Some key takeaways/thoughts from Smriti's point of view:In healthcare, it's easy to lose sight of the patient in the data - focusing solely on a condition, an area of the body, or a set of data instead of a person. It's vitally important to be focused on the data through a lens of treating the patient as an entire person.!Controversial!: It can sound time consuming to interact with data "in a much more intimate format" much like a 1:1 conversation but it's very important to drive to better outcomes. Instead of automated decisioning, we can point our tooling to compile the relevant information better to make decisions faster without removing the care or the person. Machines making automated decisions leads to worse patient outcomes."Obviously, privacy is important. Ethics is important. How do we interconnect this data and how do we get to communicate amongst" the payers and providers? So physicians can look at a much more complete picture of the patient to treat them better.There are many organizations collecting important health data about people. We need to rally around the patient outcomes instead of the financial outcomes and combine the data. Easier said than done of course.?Controversial?: Companies with important health data need to lean forward at the table, to buy-in on collaborating around sharing data or we will continue to have suboptimal patient outcomes.More organizations should make it possible to 'act local' relative to individual health. Instead of every decision being a very complex one, can we make things easier to simply make health progress if not 'fix' everything for someone's health. Basically, make it easier to make small decisions around more concrete and focused areas, much like a domain in data mesh.It's very important to empower people to leverage their own health data so we have to focus on getting them access and then giving them the power to do something with their data to drive better outcomes.There are 3 big issues we need to tackle simultaneously: 1) How do we give access to relevant and useful data to caregivers? 2) How do we ensure digital equity? And 3) how do we share data ethically?Think about interoperability - can I pull data from one system to integrate into my system - and interconnectivity - a more two-way interoperability/integration. We need to focus on interconnectivity far more.Especially in something as important and complex as healthcare, it's crucial for the data and engineering people to stay focused on target outcomes and not get lost in the code/work. A shared vision at the project and organization level are key.Many data projects go wrong because we still struggle with communication. Not that we aren't communicating but keeping all parties to data work aligned and in sync as learnings emerge is very hard. And data work needs to allow gray areas, which it often doesn't do well currently.Value-based care is a really important aspect of getting people the best care and data can help support that well. But it requires a lot of ethics and transparency to get there.?Controversial?: Digital twins of actual people could change healthcare in major ways. If done right, it could greatly improve the ability to treat patients because of the ability to test against negative health outcomes and find more optimal treatment plans.Smriti started out the conversation with a bit about her background and then jumped right into a key challenge in data around healthcare today: treating the patient as a person and not a set of data points and measurements. How do we look holistically at a person and focus on what would be best for their health AND life - the two are intrinsically linked. We still want to drive insights but personalized care, at least in the US, seems to be on the out and we can bring it back better with better data.Can we actually interact with healthcare data at the person level instead of at the billing code level? Smriti believes we can - that instead of letting the automated tooling make the calls on important health decisions - such as if key procedures or tests are authorized by insurance - we can use the tooling to allow a more "intimate" interaction with what the person is going through and how can we serve them best. That we can better leverage tools to make more humane decisions for folks.For Smriti, it's still a very tough challenge for how do we get data to interact across the various healthcare data silos, how do we smoothly exchange this data. And how do we tackle the governance of making the data interconnect? Right now, physicians cannot see a large amount of crucial patient data; but is that on patients to connect the data between offices and facilities so doctors have a more complete view? How do we maintain privacy if we are sharing information across systems? What about ethics, do we really want to give a lot of these companies very intimate health data? Scott note: see the recent acquisition of One Medical by Amazon - they are now supposedly requiring patients to waive their HIPPA rights to get care.Because there are so many challenges around integrating healthcare data across so many systems/silos, Smriti believes that one company itself can't make that big of an impact to the overall system. BUT each company being better about doing their part can help achieve a data-driven aspect to healthcare that leads to better patient outcomes. There needs to be more of a concerted effort to collaborate in the right ways.In Smriti's view, it's very important to empower people to make better health decisions for themselves driven by data. That means giving them access to more of their data, giving them the capabilities to leverage that data to make decisions, and then empowering them to actually act on those decisions. There are some pretty basic things we can do to improve the health of our fellow citizens and it's on a number of people to keep the pressure on to move forward on that. Not one entity can do it alone but we should all be pressing for better solutions.Smriti talked about 3 big challenges to sharing our data and driving better patient outcomes. The first is how do we actually get data in front of our care-givers? How can we empower the individual to share that data and how can the physicians or other care-givers access it and drive better patient decisions? The second is how do we ensure digital equity? Many people don't have access to good internet. Many are not digitally literate enough to actually participate in data sharing. How do we empower them to participate in better health outcomes? And the final challenge is how do we actually share this data ethically and with empathy? All of these are being worked on but it takes a very large cross-org contingent to move things forward. Everyone can play a part but it will take a lot of collective work.Interoperability versus interconnectivity is something Smriti is passionate about because interoperability doesn't really ensure that two systems can share information all that well - it might be that your data is in a proper format but your definition is way different than mine. Interconnectivity is about a two-way collaboration and easy integration between systems around data in her definition. That interconnectivity is necessary to really supercharge our health data revolution :)Smriti talked about a challenge in data that many past guests have touched on - how do you keep people focused on target outcomes instead of the minutiae of the work, keeping them from getting lost in the code instead of what you are trying to achieve. It's key to have a shared vision about what is the goal and why are you doing the work. And if people lose sight of that, you need to bring them out of the weeds or you'll get very interesting solutions that don't solve actual important problem.When asked why data work seems to not net expected results so often - the 80%+ of analytics initiatives don't meet expectations statistic - Smriti pointed to difficulties in communication. Not that we aren't communicating but the challenges around how do we quickly iterate together and share small-scale incremental learnings so reality and expectations are not constantly drifting apart. Basically, communication is hard and we need to place more focus on getting it right - and having a wider tolerance range initially of what 'right means - but there's no silver bullet. We need to be able to be vulnerable with each other and operate in gray areas :)Smriti really believes in the concept of value-based care. But to get there, we need transparency around price and care. Individuals need to have access to their information but also need to equip themselves with the knowledge of how to leverage their data to get better care. It isn't all on the care workers. Digital twins in healthcare is something Smriti is really excited about. A digital twin of a person gives providers an ability to potentially test reactions to different treatment protocols, optimizing positive outcomes and hopefully minimizing negative outcomes. Physicians can test a number of treatments simultaneously without experimenting on the patients themselves :) Healthcare digital twins are in their very early stages but she is quite excited about the possibilities. Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jul 16, 2023 • 16min

Weekly Episode Summaries and Programming Notes – Week of July 16, 2023

5 snips

Jul 14, 2023 • 1h 4min

#239 Panel: The Role of Data Product Management in Data Mesh - Led by Frannie Helforoush w/ Alla Hale and Jill Maffeo

Jul 10, 2023 • 1h 10min

#238 Bringing Software Testing Best Practices to Data - Interview w/ Sofia Tania

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Tania's LinkedIn: https://www.linkedin.com/in/sofia-tania/Presentation: "Data Mesh testing: An opinionated view of what good looks like": https://www.youtube.com/watch?v=stNZQESndAAIn this episode, Scott interviewed Sofia Tania (she goes by Tania), Tech Principal at Thoughtworks. To be clear, she was only representing her own views on the episode. Scott asked her to be on especially because of a presentation she did on applying testing - especially important for data contracts - in data mesh.Scott note: I was apparently getting extremely sick throughout this call so if I ramble a bit, I apologize. Tania's dog also _really_ wanted to be part of the conversation so you might hear us both chuckling a bit about her antics. And Tania has some really great insights so I probably asked her probably the hardest questions of any guest to date. She did a great job answering them though! A lot of the takeaways are about are we actually ready to do a lot of the necessary testing to ensure quality around data, which I don't think has a clear answer yet :)Some key takeaways/thoughts from Tania's point of view:We have to bring software best practices to data but we should do it smartly and not make the same mistakes we made in software, let's start from a leveled up position. Zhamak has said the same. The question becomes how but looking at how practices evolved in software should bring us a lot of learnings.Just pushing ownership of data to the domains won't suddenly solve data quality challenges. The new owners - the domains - have to really understand what ownership means and what quality means for use cases leveraging their data.A reasonably good way to measure if your data product is 'good enough' regarding data quality is to look at your SLOs (service level objectives) and SLIs (service level indicators). If you are constantly hitting the SLIs, you can probably focus more on new features. If not, you need to improve your quality.!Controversial!: Consider almost a zero-trust approach to testing for data. Test as data is flowing through the systems, as data lands. Test in development as to what changes might impact data. And then consumers should be writing tests against source data to prevent issues. Scott note: that's a lot of tests but how important is certain data to your org?In a decentralized ownership model, many data consumers are less likely to trust data - at least at first - so you need to show them why they can trust at the data product level. Leveraging proper testing and data contract strategies is crucial to being able to prove out data quality.You should look to build out a robust testing and observability framework as part of your platform. Data product owners/domains shouldn't have to build it out manually themselves.If you only have detection of data quality issues once data hits production, once something or someone is potentially already using it, that's an issue. Look to create ways to test data at the data product development stage, as part of the CI/CD. We can't rely only on lagging quality indicators if we want to up our data quality game.Data for analytics and AI is even more complicated than on the operational plane. Generally, data hasn't been transformed multiple times on the operational plane so if there is an issue, it's either in the source application or an issue with the API call that was made. In data, we have to develop smarter tests as data flows through your pipelines.Data producers need to define quality data in terms of what consumers actually want/need. Instead of arbitrarily setting quality levels, what do the consumers want?Consumer-driven testing in data sounds wonderful. But it's hard to see teams being willing to do it :) We need better tooling and ways of working to make this easier.Data quality surveys of data consumers are important for a number of reasons but are lagging indicators. They should be used to help develop appropriate SLAs/SLIs for data products and monitor if data products are generally meeting customer needs.?Controversial?: Can a data producer really develop a custom test for their data product for each consumer or do the consumer owes it to the producer to develop tests to ensure the data product continues to serve their use case well? Scott note: this could start a LinkedIn war but it's an important question to ask!If you push for consumer driven testing, don't be surprised at a lot of pushback. That happens still even in the API world where it's been more accepted for years :)Are consumers ready and able to programmatically define what good data quality means for them for each use case? There are some tools that can help but practices and tooling are still mostly nascent.?Controversial?: Many consumers still have the 'give me all you have and I'll sort through it' mindset. Trying to get them to lock into what they are consuming will be hard.There can be a real chicken and egg scenario around data products, especially testing. Consumers don't know precisely what they want and what will best suit them until they see the data/data product. But building out a data product and having to change it a lot to customer feedback is also tough - producers want to build it once instead of 10 iterations. Just be prepared for this to be an ongoing issue in data and for it to lengthen times to data product release.?Controversial?: Having your transformations handled by low-code/no-code solutions can easily hurt you more than it will help you. Be very wary. Scott note: this is coming up in A LOT of conversations recently and was featured in the Thoughtworks radar released in early MayIn software development - including development for data - abstractions are crucial but can get you in trouble. Really think deep about your abstractions because it's easy to lose sight of what underlies the abstraction. And abstractions of abstractions of abstractions just compounds the issue :)Tania started with a bit of her background, especially related to data mesh. She worked on one of the clients that was an inspiration for Zhamak's original data mesh blog post and spent 2+ years as the lead on the technical side of a data mesh implementation at another client. Her background as a developer and tech generalist have shaped her thoughts around bringing good software practices to data.For Tania, the reason she originally put together the presentation on software testing practices in data was a client question (paraphrased): 'if we already have data quality issues in the centralized setup with clear ownership and people who really know data, how the heck are we going to _improve_ data quality by pushing data ownership into the domains?' It's a very fair question - just pushing ownership without the capability and buy-in to own the data is possibly (likely?) going to lead to worse quality. So we need tests that work and can be shown to consumers to help ensure quality and trust in that quality. Showing people your kind of data quality certification goes a long way towards trust.In Tania's view, much of the existing data observability tooling and practices, while valuable, only really alert when there is already a problem that's hit production. Is there a way where we can shift testing left, not just in ownership but testing earlier in the flow of data? Earlier in the development timeline of a data product? So that is 3 potential ways to shift left, to test earlier. Think about detect versus protect - can we prevent data quality issues instead of only better identify and resolve them?Tania talked about how data product producers need to start to shift their thinking around data quality. What specifically do my consumers want - and why? Quality is inherently subjective so extract from them what their needs are and look to serve those. And we should look to stop using _only_ lagging quality indicators like surveys. They are valuable in reshaping what SLAs should be and is a data product meeting needs and expectations but they are certainly not designed to quickly detect issues. But do consumers actually know what would make data 'high quality' for them?Consumer-driven data quality testing is a good idea for many reasons in Tania's view. When we think about a single data product having 5 known, regular data consumers, does the data producer need to develop 5 different sets of tests to specifically protect against breaking changes or issues specific to each use case? Do they have to define quality metrics differently for each use case of the same data product? Do they have to be so familiar with each use case that they evolve their tests as use cases evolve? How much can we reasonably ask the data product consumers to do in the testing space to ensure quality?But Tania admitted she hasn't led a client in doing consumer-driven testing for data. It's really hard to get data testing right in general, are people really ready for doing consumer-driven data testing? We don't really have the tooling or the general best practices to do it well yet. And there is also just philosophical pushback - being forced to programmatically say what good quality means instead of saying 'the data quality isn't good enough' is a tough pill to swallow for consumers. Do consumers really know precisely what they want? Things like the tools Great Expectations or Soda Core are a good start here but we need more. And many consumers are still in the 'give me all the data you have' kind of mindset so reducing the possible scope of data they get is not an easy mindset shift.Tania also pointed to a persistent challenge in data that is a chicken and egg problem: data producers can't build exactly what consumers want until they get feedback from the consumers. But the consumers don't know exactly what they want until they've seen an early iteration of the data product. So you have lengthening time between conception to release because both sides need more from the other to move forward but can't until they get at least some information. A good way to press on consumers might be to ask them about bad-case scenarios - what has to be there and why? That will _possibly_ prevent kitchen sink feature requests.As the conversation transitioned into low-code/no-code tooling, Tania lamented about the difference between ease of use and simplicity. While low-code/no-code tools can be very easy to use at the start, as scale/complexity of use cases increases, they often become extremely difficult to manage. They are focused on ease-of-use, their architecture isn't about maintaining simplicity of managing the solution as it scales. As you add more and more views, you might actually have 30-40 joins across many data products and performance comes to a halt. This was also mentioned in the Thoughtworks Radar that was released in early May 2023 (Tania contributed to that).In wrapping up, Tania shared what she believes is a good way to measure if you are doing well enough with your data product, especially in regards to data quality. Look at your SLOs (service level objectives) and SLIs (service level indicators) - are you hitting those regularly? Then maybe you can focus more on new feature development. But if not, you might need more/better monitoring/observability. Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jul 9, 2023 • 31min

Weekly Episode Summaries and Programming Notes – Week of July 9, 2023

Jul 7, 2023 • 21min

#237 Zhamak's Corner 25 - We Don't Have to Jerk the Wheel - Making Smaller Correction Decisions to Get to Our Data Destination

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.Takeaways:We should be thinking about how we can get out of the batch mode into the streaming mode. Yes, technologically but also think about How can we get to making decisions based on smaller amounts of data more frequently - both automated systems like AI but also for our people. Instead of making adjustments or decisions based on big batches of data, we can make smaller course corrections."Data mesh is about building responsibility into data and the quality of the data you share and being explicit about that quality."Make the cost of mistakes that much smaller by creating smaller decisions that add up to the bigger decisions - it's not one giant leap, it's many steps that can avoid more hazards as you come across them."Make decisions at the speed of the market" is crucial to being nimble, being able to react to opportunities or new challenges. To do that, we need to put data in the hands of those closest to the market, the domains.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app