
Data Mesh Radio
Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh.
Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out!
Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing
Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above.
If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/
You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/
Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio
Latest episodes

Feb 20, 2023 • 1h 10min
#196 Data is a Team Sport - Learning to Collaborate Through Data - Interview w/ Andrew Pease
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Andrew's LinkedIn: https://www.linkedin.com/in/andrewpease123/In this episode, Scott interviewed Andrew Pease, Field CTO of North Europe at Salesforce. To be clear, he was only representing his own views on the episode.Some key takeaways/thoughts from Andrew's point of view (mostly written by him):Sensitizing people to data and improving their data fluency can be a challenge. Lots of people have had some less than perfect past experiences - perhaps a dry, abstract class has given them "statistics trauma". It's important to make it digestible for them to get started.Organizations typically evolve into silos so IT systems/approaches often evolve into silos too - Conway's Law. The bigger those organizations and silos are, the harder they are to bridge / the deeper the divides.Much as we'd like one, there is not a single silver bullet architecture for all organizations to overcome these silos.Without relevant IT architectures and processes, it can be challenging to put relevant and timely data and actionable insights into the business people's workflows. You won't get it "perfect" the first time, but get started and learn to improve through experience.You should reiterate to people that data is there to augment their role, not to replace it. It's there to help them be more efficient and successful in their work. That's a key part of data fluency, not just understanding how to use data but where data can help.Feedback loops are very important to increase data quality levels and data value. It's important to build in these loops to make end-users feel like they are a part of a constant and never-ending improvement exercise. It shouldn't be a big burden but data quality is a team sport.It's important for data consumers to understand not only the potential of data and analysis, but also the limitations. E.g. you can't reliably score lead quality from simply a person's name and their email address. The data needs to be representative enough to find useful patterns.AI should be perceived in the enterprise as augmented intelligence - it is there to make the human in the loop better, not replace them.It is crucial to inform operational teams, the data producers, about what data might be needed in the future, not just now. And then incentivize those data producers to actually create and maintain quality data products. If all we do is ask, it likely gets lost in the operational "priorities".Anecdotal feedback on what data is being used and is useful is great. But it's not going to tell the actual full story. Make sure to create ways to track usage and measure impact of data work.Data hackathons can be great ways to set up some cross domain collaboration and improve data fluency but also knowledge of other domains and the organization as a whole.It's vital to figure out how to get people excited about data, in combination with incentivizing them to do so in appropriate contexts.IT and the business side need to meet and collaborate in order to make data a crucial and embedded aspect of everyone's roles.As always, communication is crucial, especially around reorganization of data teams and competencies. A clumsy reorg will certainly alienate - and possibly infuriate - people."The most complex system that we have in our organizations isn't a computer, it's the people who are operating the computers." When we think about the composable enterprise, we need to think about the humans in the loops and how and where they interface.Look to have a standardized way to bring people to better data fluency. Many different roles have budgets for ongoing training in their field, everyone should have that for data and it should be part of any organization's new employee ramp period too.Andrew started off by discussing the general way that organizations evolve. It's pretty natural for most to evolve into silos and the larger the organization, the deeper the divides between the silos and the harder it is to bridge those divides. With Conway's Law, IT systems/approaches also then often develop into silos. There is a lot of required intentionality to prevent evolving into silos or lessen divides that have already formed. And there is no "silver bullet architecture" to overcome the challenges silos create or undo the silos.One of the big dreams of being data driven is putting timely and actionable data - "what do you want to tell them?" - in the workflows of business people. But, according to Andrew, many organizations attempting to do that look at it as an all-or-nothing kind of goal and that's just not reasonable. You won't get it "perfect" at the start. And that's okay, it doesn't make it not worth doing. As part of that process, it can be very important to reiterate that data is there to help not replace people - AI should mean augmented intelligence, it's there to help the human in the loop be better.There are two major opposing forces re data quality in Andrew's view. First, you never get a second chance to make a first impression so your data quality has to be up to a certain level before showing to potential consumers. But conversely, the only way to get to actual quality data - essentially what matters, why it matters, and what quality levels are acceptable - is to get data in front of consumers and then iterate towards the required quality. Feedback loops are crucial to actual data quality so you can optimize for what matters. Your data consumers must understand that data quality is a team sport so they need to participate too.Andrew brought up his concept of "statistics trauma" when discussing improving people's data fluency - essentially, many have a bitter taste from past statistics/math and/or data related work/school. So to get execs more data driven, you need to sensitize them to data but in a careful approach. That falls to the CDO and it can be challenging but is quite rewarding when it works. It's as much about communication as anything else in data.In data, Andrew believes there needs to be far more bi-directional conversations. Data consumers need to tell data producers what they need and that can include data that doesn't exist yet so the producers need to start capturing it. So the earlier a data consumer can tell a data producer about their needs, the more likely they will get what they want down the line. Data mesh helps there because it's not the central team trying to understand and take requests to the producers. By cutting out the data team in the middle, you have a better chance to get to what data consumers want more quickly. But we can't lose sight of something that many seem to overlook - we can't just inform data producers of what we want them to produce and maintain, we need to properly incentivize and enable them to do so.In Andrew's view, there is obviously value in collecting feedback on what data is viewed as valuable. But it's going to have bias - essentially it's valued but might not be valuable - so you should develop more concrete ways to measure what data work is useful and valuable. We should track what is being used but also how well what we thought would be valuable actually performed - that way we might better know what additional data might drive incremental value. Your feedback loops should include both quantitative and qualitative measurement where possible.If you make ultimatums around data usage - you're either with us as a data user or you're against us - you won't get buy-in per Andrew. Mandates just don't get the buy-in some people believe. So you need to work to figure out why someone is not leveraging data. Again, make it less intimidating and make it rewarding. If you threaten people - do this or we'll fire you - you will simply get people adhering to the letter instead of the spirit - we want to make using data useful AND fun. Gamifying learning about data and data hackathons are two great ways to accomplish that. Around data, if we want a "yin yang synergy" between business and IT, both parties have to meet the other _more_ than halfway in Andrew's experience. Both sides have to be willing to partner to improve. There isn't a silver bullet way to accomplish it but embedded IT in the business and vice versa can certainly help. You could rotate people across different business units. Etc. However, it's very important if you are in a decentralized organization to make sure you share best practices.Andrew said, "the most complex system that we have in our organizations isn't a computer, it's the people who are operating the computers." There is a major change in the way our brains work between learning something and trying to get a point across. Some people are good at switching between those quickly - e.g. in a meeting - but many aren't and it's important to not leave them behind. So communication is crucial to get right and think about the broad group you are trying to work with. Sometimes data should be brought in to the discussion to make a point but sometimes it should purely be about increasing data fluency. It's easy to try to focus on hiring for data skills in many roles but really, every organization should at least consider data training as part of a new employee training according to Andrew. Obviously don't forget existing employees but immersing people in data, especially the data of the organization, from the start pays off in the long run.Quick tidbit:Data consumers need to understand what is actually possible with data. E.g. lead scoring based on a person's name and email address is not a reasonable request.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 19, 2023 • 27min
Weekly Episode Summaries and Programming Notes – Week of February 19, 2023
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 15, 2023 • 19min
#195 Zhamak's Corner 18 - Fixing Unnecessary Complications in Serving Data to AI/ML
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.So, continuing the conversation about AI and ML's place in data mesh, we start the episode with Zhamak discussing an unnecessary complication we've created in data - why do data sets/assets only have to serve one user or even user persona? Yes, product thinking is about creating reuse but are we thinking reuse across regular analytics and ML/AI at the same time? We need to make it easy to give access in the language of, that native mode of access of, the data consumer. We shouldn't have to care what it is used for, regular analytics, ML, or anything in between. There's also this very painful bifurcation between upstream data production and data science where the second data enters the data science realm of influence, it's copied over and you lose sight of it for discoverability, governance, security, quality, etc. They pull it in and then it's essentially impossible to track. That creates all kinds of problems. So why don't we extend data mesh into what they are doing? Do they need to make copies of the data in the feature store? If they have a trusted source of access to the data, do they care?Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 14, 2023 • 18min
#194 The One Where Scott Goes off About Data Contracts (Part 1) - Mesh Musings 43
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 13, 2023 • 1h 8min
#193 The Hidden, Pesky Persistent Challenges in Data-Intensive Applications/Service/ML - Interview w/ Ebru Cucen
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Ebru's Twitter: @ebrucucen / https://twitter.com/ebrucucenEbru's LinkedIn: https://www.linkedin.com/in/ebrucucen/In this episode, Scott interviewed Ebru Cucen, Lead Consultant at Open Credo. To be clear, Ebru was only representing her own views on the episode.Some key takeaways/thoughts from Ebru's point of view:It's far too hard for data producers to actually reliably produce clean, trustworthy, and well-documented data. We need to give them a better ability to do that, whether that is tooling or ways of working remains to be seen. Scott note: It's no wonder it's been hard for many teams to get their domains to own their own data ;)There is a hidden challenge in data-intensive service/application development. The version of the data - the schema, the API, and the data itself version - need to be understood and coordinated as the developers don't control their own data sources unlike software development of the past. But we don't have good ways of doing that right now on the process or tooling front - data product approaches help but fall short.We are lacking the tooling to easily manage data quality for producers. While there are so many data related tools, there is a real lack of things that make it easy to manage the quality. We are getting there on observing or monitoring quality, but not managing and maintaining quality.Fitness functions can help you measure if you are doing well on your data quality/reliability.As the speed to reliably ship changes on the application side increased - microservices and DevOps -, that just made the data warehouse, the data monolith that much harder to deal with. Instead of slow-changing inputs and gentle evolution, it simply became more and more of a data exhaust model that breaks the warehouse.Large data monoliths are just far too hard to maintain, especially as the speed of change of application and the world increases. However, monoliths aren't 'the enemy', even microservices advocates say sometimes a monolith is the right call. Look to figure out what's the right solution for the now and future. Don't distribute, don't decentralize without a specific reason.?Controversial?: Data really needs much better version control systems and practices. Yes, there is the versioning of the data product but the actual versioning of the data - that immutability factor, when did this data change and what was it before that - is the most important versioning for data/analytics.Versioning means safety - safety for consumers but especially for producers to be able to roll back. We need those better safety features so we can test much more thoroughly in data but right now, we don't have great ways to do that.It's hard to fight Conway's Law. If we don't fix our ways of working together, it is extremely difficult for consumers and producers to align well enough to get the most value from our data. Communication issues will be reflected in the data as well.The tools we have for data are so specialized, you might need to use 5+ tools just to properly manage a simple ingestion process - it's just not there to support the producers well enough.How can we observe and validate data before writing very specific testing - testing shouldn't be the only line of defense. We need a way to define and create our quality gateways much more easily.With fast feedback cycles and close collaboration around data, especially with data science, it makes everyone so much more productive. E.g. people aren't building on deprecated data sources and you can get to initially testing a hypothesis in a day or days instead of weeks.It's important to think of your data like a garden instead of a single project - you must tend to it and improve it further. Your garden is never "done" and weeds can creep in quite easily. Get that green thumb.To build to good models in data science, you have to ask what questions can we ask of the data, can we get enough data, is it high enough quality, etc. You need to answer will we be able to achieve a likely positive outcome and then iterate towards good - and then make it better - instead of making things all or nothing and making a static model.Not all questions can be asked of the data you have and you need to measure how well what data you have can answer the questions you want to ask. Be realistic about what you are trying to do and what you actually can do based on what you have.How do we create psychologically safe environments where people can fail safely and learn from that - we need iterative communications, interactions, learning, and development.Inject more empathy into your teams and communications. We need a better way of understanding the challenges, what are we achieving together, instead of what is each person's role. The sum of the parts is the purpose.?Controversial?: As we increase the amount of data we have and the number of people attempting to leverage that data - and let's not forget the increasing complexity of the world at large - we are likely to see it get harder to communicate relative to data. We have to try harder than ever to get it right, or at least to an acceptable end outcome.?Controversial?: Similarly, our understanding of certain questions or sets of data will change more frequently than historically and communicating that - and why our understanding has evolved - is going to get more complex as well.Ebru started by sharing her background where she was a software engineer and trainer including training people on SQL before moving into data/data science. As a software engineer, it was crucial to at least model and understand data well enough to ingest and store it for the application side. The big challenges for software engineers really came in integrating that data into the monolithic data warehouse and then keeping it well integrated as the application evolved. The monoliths were bottlenecks on the software side and the integration into the data monolith was just becoming too much of a major bottleneck for all. As the DevOps and microservices movements picked up steam, the speed to reliably changing the application significantly increased. That increased speed created more and more challenges in integrating into a monolith - the applications drifted far too quickly to easily work with the data monolith. But monoliths aren't necessarily the wrong choice for all, just at scale and especially scale of complexity, they become a massive bottleneck.When talking about versioning, Ebru talked about the many copies of data challenge - which one is the right one to use and can I trust it? There are people doing incredibly important work on data where they can't reliably trace it to source and know they are working on the right version. And with no clear ownership of data, nothing ever gets cleaned up so finding a reliable, repeatable source of data is very hard. So people copy the data they do find to their work area lest the source goes away, creating more copies. So we've figured out how to do versioning relatively well on the software/microservices side with APIs but we haven't figured it out for data - whether that is versioning the analytical API or the data itself. It's far too hard to make our data assets maintainable right now, thus the big push to data mesh.For Ebru, when asked specifically what is the most important aspect of versioning in data - code, schema, API, or the data itself - she chose the data itself. This is a somewhat controversial choice but her reasoning was traceability - what actually happened to the data and when did it change? She expects that the code versioning, we'll have more version control systems and many people already manage their data related code work in git or other systems. Another point Ebru made was that software development hasn't really had a focus on aligning itself to what version of data it is using. When you do a production deployment, the database is the database, it's tied to the application. But when we start to think about how we actually deploy software going forward, if it is referencing external data as part of that, the version of the data source it's leveraging obviously matters far more and we need far more coordination to ensure the software is referencing what we need it to. There is not enough tooling out there to easily manage this coordination and it's causing far too many issues.Scott note: this is a really incremental thought here but VERY hard to explain. Historically, most services have been more or less wholly contained in what data they use or they access information from other services via a versioned API on the operational plane. So the coordination is less challenging. We have not really figured out well how to do that for data intensive applications - this is partly why everyone is building data products, whether data mesh or not, but it's still challenging if you don't think about providing a steady access mechanism and a way for a consumer to know what they are accessing hasn't suddenly changed without their knowledge. See the episode on my rant on data contracts and how it's not just schema and constraints.We just can't escape Conway's Law according to Ebru. While many people have applied it to the operational plane, we really need to think about how Conway's Law applies to data. The way we exchange information can't only be the data itself, we need to get better at how we actually communicate and collaborate internally or gaps in how we communicate will be reflected in the data and our data integrations. Without fixing the way we work together and communicate, the producers and consumers will not collaborate well enough to leverage our data to the fullest extent.Ebru believes that right now, it's still far too hard for producers to reliably publish clean, trustable, and understandable data. We haven't developed great ways of working and the tools are definitely not there yet. So if we try to push ownership on them too quickly, it will not go well. They have historically published what they want and we need to make it far easier to publish what consumers specifically want or they won't likely want to participate. Data mesh is a sociotechnical approach but for Ebru, there is a lot of talk about the social and the technical is still lacking. There are so many tools but they don't work together that well natively and most only do a few very specific things - you could need 5+ tools to accomplish just the ingestion part of a use case. There is also a major challenge on the testing side - can you observe what changes would occur before writing the tests? In general, we need to change our ways of working in data to enable much faster feedback cycles in Ebru's view. She was working on a project where everyone was in close collaboration and you could try things out and get feedback in the same day, meaning there was far less time spent building toward a solution only to find out the data wasn't available or there were other challenges. With better data ownership, we can go from idea to ingestion to testing in a short period of time, significantly improving how productive data science team members are. Scott note: if you listen to early data mesh presentations from Zhamak, she talks more about data science/machine learning/AI than regular old analytics. This is that data bazaar/data marketplace kind of concept in action.Ebru believes we need to take more learnings from microservices, especially the concept of Lego pieces. In data, we haven't really built incrementally to really achieve good value - it's often been all or nothing. But cloud means we have a chance to do things differently. That iteration means we can fail faster too - if we have an idea but we can't get the right data or even get enough of the right data, instead of building for weeks, we can change course. It's important to realize you can't ask any question to any data as well - sometimes you have a question that just can't be answered with the data you have or can get and that's okay.To do data well/better, Ebru believes we need to create psychological safety and an ability to fail safely. That means we will have to train data consumers far better on how we work with data - a 95% confidence interval doesn't mean what most believe. And our understanding of data evolves too so consumers must learn to evolve their understanding. Human interaction is far more crucial than many want to believe in doing data well. In data, as Zhamak has mentioned this trend towards super fractional roles, Ebru believes there is far too much focus in many organizations on what specifically is "my role" instead of what is the team's role and how can we make sure we accomplish our objectives. This fractional thinking of course creates more friction and challenges and handoffs - handoffs are always a place of lost context. So work to have teams focused on accomplishing team goals instead of individual ones.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 12, 2023 • 19min
Weekly Episode Summaries and Programming Notes – Week of February 12, 2023
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 10, 2023 • 1h 8min
#192 Diagnosing the Analytics Gap - All About Diagnostic Analytics - Interview w/ João Sousa
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.João's LinkedIn: https://www.linkedin.com/in/joaoantoniosousa/João's Medium: https://joao-antonio-sousa.medium.com/Brent Dykes' LinkedIn: https://www.linkedin.com/in/brentdykes/In this episode, Scott interviewed João Sousa, Director of Growth at Kausa.ai. To be clear, he was only representing his own views on the episode.The "four types" will often be throughout this summary. The four types refers to the types of analytics: descriptive - what is happening; diagnostic - why is it happening; predictive - what might happen in the future; and prescriptive - what actions should we take.Some key takeaways/thoughts from João's point of view:Of the four types of analytics, diagnostic analytics is VERY underserved. The other three - descriptive, predictive, and prescriptive - are where most organizations are focusing more so there's a "diagnostic analytics gap."?Controversial?: Of the four, diagnostic analytics requires the most domain/business expertise.Tips for improving your diagnostic analytics: 1) show the value of drilling down in to the why - find a few use cases and communicate the value well; 2) promote a closer collaboration between data and business people; 3) improve your definitions around data roles; 4) very clear communication of expectations and who does what; 5) don't get into firefighting mode, have a structured approach to diagnostic analytics; and 6) automate the repetitive parts.There are 3 levels of diagnostic analytics immaturity: getting "stuck in the what" instead of the why; "the usual suspects" where you look at things from the same angle, same slice and dice and only leverage a small portion of the data available; and "need for speed" where you have the right data culture of drilling down to the why bet often get stuck in the trade-off between speed and analysis thoroughness.Signals you are "stuck in the what": 1) in reviews, teams are just reviewing what happened instead of data-driven recommendations or data-driven hypotheses. 2) You lack impactful stories when asked "what real data-driven insights have you shared recently that drove action?"Signals you are stuck in "the usual suspects": 1) few to no real new major insights or hypotheses. 2) Lack of incremental data work / no new slices and dices to further analyze data. 3) Indirectly, data teams become more disconnected from the business.Signals you are stuck in "need for speed": 1) continuously cutting corners on thorough analysis in the name of speed. 2) Juggling too many tasks / priorities with constant 'hair on fire' type requests.Culture and people are the biggest levers in data but the hardest to change. We need better processes and tooling to enable accelerating diagnostic analytics. Tooling and processes specifically for diagnostic analytics are few and far between.Many companies do not put much value and/or effort in to diagnostic analytics - that is highly correlated to analytics maturity.Diagnostic analytics work can be seen as boring compared to predictive and prescriptive work. It's typically not as technically challenging and many data people are not as interested in the business aspects.?Controversial?: The best teams segment their diagnostic analytical questions into strategic, tactical, and operational. High performing teams also adjust thoroughness versus speed to best suit the specific need. They also automate as much as possible to reduce burden on the human in the loop.!Very Important!: Decentralization presents a big potential risk to diagnostic analytics. Analytics within the domain seems covered but many questions are cross domain...Scott note: There isn't a clear owner of diagnostic analytics in data mesh - if domains know their own data well, they should be able to do diagnostic analytics on information internal to the domain but it will be far harder cross domain. And that is end-state, not mid mesh journey. Diagnostic analytics likely falls to where you have your business analysts, whether that is embedded only, centralized only, or a mix.Insight definition criteria (from Brent Dykes): 1) provides a shift in understanding; 2) is something unexpected where the organization was not previously aware; and 3) it's relevant and/or aligns to what stakeholders care about. João added a 4) it must be delivered on time and communicated effectively.João started the conversation discussing the four types of analytics: descriptive - what is happening; diagnostic - why is it happening; predictive - what might happen in the future; and prescriptive - what actions should we take. Most of analytics work over the last 30 years has been the descriptive and both descriptive and diagnostic are typically owned by the analytics team. Data science, ML, and AI have moved the needle for doing predictive and prescriptive analytics the last few years. But diagnostic analytics remains underserved. That diagnostic analytics gap exists for a number of reasons in João's view. On the people side, diagnostic analytics requires two sets of skills/knowledge: the analytical + technical and the business + domain. Without the domain knowledge, it is far harder to connect the dots around the why - a key concept in data mesh in shifting data ownership left*. Yes, we know sales in this region are falling, but why, what changed? João believes diagnostic analytics requires the most domain knowledge of any of the four types. * Scott note: I always think of the Pastafarian - or Church of the Flying Spaghetti Monster - figure as to why they dress like pirates and how the number of pirates is strongly inversely correlated to global temperatures. See here. Correlation doesn't mean causation, relevant XKCD hereOn the tools and processes side, João believes diagnostic analytics is far less developed than any of the other four types. Dashboards are great for descriptive analytics - what is happening - including some exploration but they are difficult to use to actually understand the why, drilling down to the root cause. Culture around diagnostic analytics is another large issue for many organizations - there are many varied approaches and lots of differing views on the actual value of doing deep diagnostic analytics.João has three different diagnostic analytics immaturity stages before getting to a well-functioning approach. The least mature is "stuck in the what," where the business stakeholders are the ones trying to do diagnostic analytics with low data fluency* to drive to the why. They are only reporting the what, the descriptive analytics. The second maturity level is "the usual suspects" - essentially, the team builds lots of slice and dice dashboards and then just monitors things using that, they don't think to keep adjusting their angles and dig in. The third maturity level is "need for speed." The domains have the capabilities - usually via embedded analysts - to analyze their own data but are almost always siding with speed versus comprehensiveness of analysis. The world and business are changing fast but it takes time to do good analytics well to generate an actual insight.Scott note: this brings up the question of where diagnostic analytics lives in a data mesh implementation. If domains have high data fluency, then presumably they can do their internal analysis but what happens if the information to drive to the why is cross domain? This is why I believe many domains are likely to have their own business analysts but organizations will still have a centralized business analyst team too.On the question of who does the diagnostic analytics in most organizations - an empowered and highly data literate domain or a centralized analytics team - João said it depends. In a low data maturity organization, it's typically the analytics team - hopefully pairing closely with the business. In a higher data maturity team, it's about upskilling the subject matter experts in data and providing the right tools so they can do the analysis themselves.João shared two signals you might be "stuck in the what", that you need more diagnostic analytics maturity. The first is in your weekly or monthly review meetings, you are talking about what is happening and there are only some high-level guesses as to why - "oh, that's _probably_ because we changed the website" - and not much more. Nothing is data-driven answers or even hypotheses. The second is reflecting that you haven't taken any real data-driven actions with a large impact recently. If you aren't driving your actions from your data, it's likely you aren't answering the "why" questions.It's a lot harder to detect if you are in "the usual suspects" phase of maturity per João. The data teams aren't getting lots of additional requests. The business people are generally happy because they have dashboards that show a lot of information sliced and diced in how they typically look at things. But they are only testing existing hypotheses and not really coming up with fresh/new insights. So two signals are that there aren't really any new insights or hypotheses and there aren't many requests from the domain to the data team. The third signal, one that's indirect, is that because there is that lack of incremental data work and requests, the data teams start to become more disconnected from the business.Teams that are stuck more in the "need for speed", João said while it's a better place to be, it's still frustrating. Teams are always trying to balance thoroughness of analysis versus speed. So some signals you are there is the pressure to cut corners on thoroughness of analysis in the name of speed - that actually happening is another signal - and constant high-priority interruptions for diagnostic analysis, juggling too much and putting aside the long-term work to take care of the fast turnaround requests.When teams break past the immaturity stages for diagnostic analytics, João pointed to a few things high performing teams do well. The first is to segment questions/requests into tactical, strategic, and operational. Strategic questions are typically more big picture so they change less frequently and thus are typically less urgent than operational or tactical requests. Strong teams also adjust their thoroughness versus speed depending on what the situation calls for. Lastly, they automate as much as possible - there is still a human in the loop but repetitive tasks aren't value-add tasks for someone to do.João shared Brent Dykes' definition of an insight, which is probably much more strict than many use. First, it must provide a shift in understanding - so not "we found this anomaly", it changes what people know. Second, it must be unexpected - so those teams stuck in "the usual suspects" won't meet this because they are only testing against the expected. And third, it must actually matter, it must be relevant and/or aligned to what stakeholders care about. João added his own criteria of it must be on time and communicated effectively. These are all necessary to actually drive the right action.João wrapped with a few tips for improving your diagnostic analytics: First, show the value of drilling down in to the why - find a few easy initial use cases to really show the value, not your most difficult questions that will take months to really answer. Second, have the data and business people collaborate more closely so the data people can better understand requests and business people can start to think about new analytical approaches. Third, really get clear around your data role definitions: who does what and why and what _aren't_ they supposed to do. Fourth, start to get very clear on expectations, improve that communication so everyone is on the same page. Fifth, plan ahead and don't get stuck in firefighting mode grasping for straws - it's too easy to approach diagnostic analytics in an unstructured, reactive manner. Finally sixth, look to automate away the repetitive parts as much as possible.Quick tidbit:Beware the 'boring' label for diagnostic analytics. Many data people want to focus on the more technically challenging predictive or prescriptive analytics. Show people diagnostic analytics is valued.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 8, 2023 • 17min
#191 Zhamak's Corner 17 - AI/ML's Place in Data Mesh
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.Humans by our very nature categorize things - otherwise how can we really differentiate? How can we learn about new ideas and experiences if not finding a way to store them in our mental models. And in data, we've been treating diagnostic and descriptive analytics as an entirely different category to the predictive analytics of AI and ML. The way we partition the world in data is around how data will be used and then prepare the data as such, to be very fit for purpose. What if instead we partition around the data domain and don't really care about who or how things are used - we want to serve all consumers - what changes? Can we create data that is simply usable by many? Does that actually reduce complexity overall by not owning data production designed to specific purposes? Do we really need to treat AI/ML as if their consumption is all that different or special?Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 6, 2023 • 1h 24min
#190 Data User Experience (DUX): An Introductory Panel - Led by Karen Passmore with Alice Parker and Wannes Rosiers
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Karen Passmore (CEO at Predictive UX) led this discussion with Wannes Rosiers (Product Manager at Raito) and Alice Parker (Data Engineer at DNB). This panel was held in partnership with Data Mesh Learning - you can see a link to the video here: Panel: Data User Experience - An Introduction (Data Mesh Learning and Data Mesh Radio)Alice's LinkedIn: https://www.linkedin.com/in/aliceparker/Wannes' LinkedIn: https://www.linkedin.com/in/wannes-rosiers/Blog post 'The Importance of UI/UX - and why Raito’s first hire was a designer': https://www.raito.io/post/the-importance-of-ui-ux-and-why-raitos-first-hire-was-a-designerRaito blog: https://www.raito.io/blogKaren's LinkedIn: https://www.linkedin.com/in/karenpassmore/Predictive UX: https://www.predictiveux.com/ Some key takeaways from panelist Wannes Rosiers: DUX handles user experience using data for all your users (data producers, data engineers, data analysts, data scientists, report users, ...), hence it is not - and certainly not restricted to - data platform user experience.Data products are typically chained from producer oriented data products up until consumer oriented data products. Downstream product and UX requirements are completely different than Upstream, yet you should be considering all of them. This friction between global UX and local UX is the biggest challenge to scale Data User Experience.Much more than typical digital products, you have no idea how people interact with your data products. We are moving more and more to self-serve analytics, which means that while developing a data product, you don't know what will end up on the screen of your user. You can wireframe fixed dashboards, you can't for certain other data products.As always, accessibility of your data product is a huge part of your user experience. Next to this, it is important to monitor usage to continuously evolve your product. (Not surprisingly the two key elements we work on at Raito)Domain thinking exceeds source thinking in your product. You should abstract away the concept of data sources, users are interested in valid and bounded insights, typically domain-bounded. And last but not least. You need top-level decisions on data product granularities. To make sure you can connect certain data products, they should be interoperable and preferably on the same granular level. Which level to pick is mainly guided by company KPIs, hence you need management support to pick these.Scott note: I am pretty new to thinking and dealing with UX in data so I took an opportunity to write down some of my own takeaways that may or may not agree with any or all the panelists. Hopefully you'll find them useful.The most key takeaways from Scott's view and learnings:If data is not usable, is it useful? How many data projects fail simply because no one really made the data usable?UX KPIs are really important because it's such an easy thing to overlook.If you have a bunch of fragmented UXs in your data value chain, your overall UX will suffer. Think about a dish - you might prepare all the ingredients perfectly but if they were all cooked separately, will there be the right flavor harmony? Probably not, it will be disjointed flavors that don't blend well.Sometimes we think about the target business process in user experience so we don't always have to think about directly using a platform - we can think about how things integrate into a business process, a workflow. If I don't have a UI but interacting with data is part of my day-to-day role, that's a user experience.Data projects and data work need three things: business requirements, technical requirements, and user requirements. Many skip the user requirements at their own detriment. The best way to improve user experience is to actually talk to the users about their wants and needs 😅Empathy is crucial to data UX - make sure empathy is baked in to what you do :) stop prioritizing your technical requirements and business requirements over your user requirements.It's very hard to figure out the user experience path between 'I create a data product' and 'user self-service drives an action from the data product'. It can feel a bit underpants gnome-like of step one steal underpants, step two ???, step three profit. So get really in depth talking to your team about what is necessary to actually get to an actionable insight from a data product.UX technical debt, while not insurmountable, will be a bigger hindrance than many expect because the UX is typically intrinsically tied to the underlying implementation. So to improve the UX to a great degree, it can also require improving a lot of the underlying implementation. Definitely not always the case but it's something to watch out for.A lot of additional takeaways from Scott:User experience is often overlooked in data because historically, most data manipulation or analysis has been performed by experts in their own domain - e.g. a business analyst is an expert in SQL so all they need is SQL or data engineers have handled ingestion and transformation so there hasn't been a great UX.It's easy to get trapped in the idea that a "data user" is just a data consumer but if we aren't designing and creating a good user experience for producers and anyone else involved in dealing with data, we are looking for trouble. Each user persona has different requirements and it's important to not try to design one-size-fits-all experiences because it will probably be one-size-fits-none.User experience is about lowering the bar to working with data to make it actionable, whether for a producer or consumer or anyone else. How do we remove friction from data to action?Data consumer user experience spans a lot of things from finding then accessing then understanding data and then often transforming into their own purposes.Data user experience can be improved at any point in a data project's lifecycle but it's always best to start at the beginning if possible.How can we make it easier/quicker/more reliable/etc. to get to an initial answer on is data valuable?Part of ensuring a good data user experience is enabling the data consumers to easily communicate their expectations/requirements to data producers. It's not just the UX at point of interaction but also the UX to make the data better.It's important to understand focusing on UX isn't about right or wrong, it's about improving the user experience over time.The data producers and consumers are often an afterthought when designing data architecture. The UX is something at best smeared on top to try to make it a little easier to deal with when it should be a crucial aspect of the architecture itself.When many people think about data UX, they think about someone consuming from a dashboard, not all the steps that data had to go along to get into that dashboard.Metadata is crucial to helping people understand the data they are accessing - especially when you consider being able to trust data as part of the user experience. If you can't trust it, you won't use it!It's important to think about UX as a lifecycle because there is a data lifecycle - information is created and then acted upon creating more information of a sort. So we can't think of it as "data is consumed" and that is the end of the line for the business when it comes to broad user experience of data.It's crucial to think about user experience at the data product level but even more so at the informational/question level. How can data be combined from multiple sources? Does UX extend to data interoperability standards even? Yes, but that shouldn't be managed by the UX team necessarily 😅A very big part of UX is just encouraging conversations because then UX ideas will emerge. 'Oh, you're trying to do X, that makes so much sense, let me make this change to make that less painful.' But too often users take what is given and don't pipe up - give them room and encouragement to share feedback.Similarly, open communication seems really integral to getting to a good UX. You need people interviewing the users but users also need to have space to discuss too. UX knowledge transfer shouldn't only happen because someone asked.Information transfer can be really hard, especially if we do it all as pure documentation. Look for ways to enable better sharing of information - lower the friction to creating simple information sharing mechanisms, whether that's sharing the data or the metadata.A huge benefit of data mesh is reliable and scalable data sharing yes but it's also about the art of the possible - 'oh, we have this data, what about XYZ use case, can we do that?' How can we embed curiosity and an ability to explore more in our data systems while still protecting privacy/PII/etc.?It's easy to get myopically focused on the platform UX and not the total data UX. Some of that is cultural too. You want to try to design for it but the platform team can't own developing the culture. But making it loud and clear what would drive more value might help you design a better overall data UX.It's very hard to know just how a user will interact and experience a data product. So have some conversation with potential users and maybe give them a private tutorial so they don't go off the rails. It's hard to embed 'understanding guardrails' into data we share.It's easy to lose sight of cost when considering user experience. How do we make it so we can enable great experiences in a cost-effective way?It would be great to have data usability design patterns and methodologies emerge for creating usable data products but we are VERY early days there.Usability testing is pretty crucial because while they aren't your enemy, the saying "no plan survives contact with the enemy" is still apt with data work in general. Don't design in a vacuum, work on constant communication and information flow.It's crucial to think about UX at the micro and the macro level. If you only think about it at each data product, you are missing the information of the organization and how it all interoperates to tell stories. But it's also pretty easy to focus on the grand platform and then dealing with the data products is a bit of a nightmare. So again, it's not easy :)Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Feb 5, 2023 • 33min
Weekly Episode Summaries and Programming Notes – Week of February 5, 2023
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf