Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Sep 2, 2022 • 1h 4min

#122 Unlocking Your Data Value Through Good Product Practices - Interview w/ Alla Hale

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Data Product Manager Role on Alla's team: https://jobs.ecolab.com/job/15770737/data-product-manager-barcelona-es/Alla's LinkedIn: https://www.linkedin.com/in/allahale/In this episode, Scott interviewed Alla Hale, a Data Product Manager at Ecolab. To be clear, she was only representing her own views rather than anything on behalf of the company. She is hiring in Barcelona as well, see hereSome key takeaways/thoughts from Alla's point of view:The most useful question in your quiver as a data producer is "what value would having this unlock for you?" It's not about pushing back, it's about skipping to collaborative negotiation. How can you work together to unlock business value?It's important to remember this when there is a data request: "they are coming to you because they need your help." Act accordingly - with empathy and patience.You need to really consider your data user experience (DUX) for your data products. How can you quickly get people past figuring out "what the product is" to leveraging the data product to drive value? You want users to enjoy using your data product.User stated requirements often do not match actual user needs. To maximize the return on your data work, look to exchange context to find the needs instead of just taking requirements at face value. And do so with patience and empathy.No prototype, no meeting = having something tangible - even if that is simply a process map on a Post-It note - for people to react to. Otherwise, what will the conversation be about? How do you prevent the meeting from being a waste of time unless there is a specific topic to address?We need to take lots of learnings/practices from tangible/physical goods product management when thinking about data products. We have users who have needs. How can we best serve those needs and drive value through that? It's always about serving the users' needs.Another product management learning - how can we do fast prototyping? Prototypes have an actual cost, even in data/software. What gets us to value quickly? How can we capture value early as we iterate towards product quality?All products - and their features - have a lifecycle. Be prepared to prune features or an entire product if they no longer drive more value than the cost to develop + maintain.Discussing sunsetting and pruning should happen with users even in the development process. Far too often, data consumers are used to data assets being available in perpetuity - even as they degrade. We need data consumers to be part of an active conversation about if they are still using something and how to create more value or lessen costs to produce data.How can we avoid the data product manager having to do literal sales and marketing of their data product internally to drive usage? How can the organization make internal communication easier instead of literally marketing data products internally?To really create the most value from data products, there must be bi-directional communication. One, the data product owner should create demand for what is available. Otherwise, why build for reuse if multiple parties won't use your data products? But second, we also need to react to data consumer demand to drive incremental value through new data products or features.You should never develop a new data product without a very specific use case - Scott note: this is a common question in data mesh conversations. It's about 75/25 split on only building to specific use case versus not from conversations thus far. It's crucial to understand the data doesn't make the decision. And only look to measure things that will change your view depending on what the results are. Alla started with what should data product management take from general product management based on her time managing products in a number of tangible goods spaces. Start from the basics: you have a user and they have needs that you want to try to meet. And that you are responsible for discovering and summarizing those needs, not that the user should understand all their needs upfront - that just leads to requirements that often don't actually address needs... Just be careful to extract needs and push back on requirements with patience and empathy. But it all starts with needs, not the data.Alla gave some thoughts on needs versus requirements and expectation setting. Consumers understand in a physical goods space that you don't just magically have the prototype or product - but with data, there is often the misconception that "you already have the data, why can't you just share it?" So you often have to start conversations with setting realistic expectations - realistic expectations about next steps, timeframes, etc. Helping data consumers understand that you aren't deprioritizing them and what's actually possible and when - including why that is - will lead to a better relationship.There is an understanding that prototypes have a real cost in the physical goods space, we need to make consumers understand it's the same in data and analytics, per Alla. And that different grades of prototypes mean different costs and time to develop - how high of quality do you really need this to be? How high quality do you need for the prototype versus the end-state? And exchange context around what requirements/needs drive what challenges so they might deprioritize needs for you.Alla's key phrase is, "what would having this unlock for you?" Instead of pushing back by asking "why do you need this?", her framing gets the other person on the front foot, leaning in to the conversation to share what this could mean for them. It gets them to what Scott calls "collaborative negotiation", a process where you can quickly iterate to what's really of value instead of a list of requirements that might not actually serve the needs of the use case. This question extracts more context from the data consumer where you might even be able to add incremental value that wasn't part of their ask. Again, this is not about deprioritizing or pushing back, it is about driving to the business value. Remember, they are coming to you because they need your help.A general rule Alla has is "no prototype, no meeting." That prototype can be just a small drawing of a process map on a Post-It note. The goal is if we are going to have a meeting, we are meeting about something - a useful conversation will be about something, something people can react to. This prevents the dreaded update but no actual reason to meet meeting - the epitome of this meeting could have been an email in Scott's words. Instead, concrete aspects of a prototype elicit a response and that response means you can decide where to focus next or if what's been developed thus far will meet needs.Alla emphasized that in product management, you are designing for people and developing specifically for the user. People aren't robots and we need to give space for emotional responses instead of purely logical ones. And while you are looking to serve user needs right now, those needs might change - or you might fully serve the need with your product so the need doesn't exist anymore. Thus, you need to understand that products have a lifecycle. Don't be afraid to sunset things that are no longer valuable.As well as sunsetting a product, the concept of pruning should be discussed with data consumers at the start of development. You want data consumers to understand that things will change and some of that is removing "features" of the product - which might be certain parts of a data set or certain ways of accessing the data product - when the costs exceed the value to maintain. You want data consumers to be active in the conversation about what is still useful and what isn't but sometimes you'll probably have to resort to turning something off and seeing if people scream. Hopefully the mesh tooling can give data producers good insights into what is being consumed but even that is often automatic consumption.One thing that is quite dissimilar in data product management from her prior product management roles is previously, she had partners in R&D, marketing, and sales in the process. Many parts of what each of those roles added to the product were crucial in driving value. How can we make sure we are having the conversations internally to drive usage but without the data product owner and/or manager literally marketing and selling? Alla sees it as somewhat on the data product manager but also a much larger degree on the organization - how are data products marketed internally? How can people learn about new products or changes instead of consumers having to discover every data product themselves?Alla shared her view that you should never develop a new mesh data product without a very specific use case. This is a common question in data mesh: what should be your reason for creating a new data product - a specific use case or the domain sharing information they believe will be useful?An interesting concept Alla brought up is that all domains should think about who can benefit - especially who will benefit most - from the data you have. That way, you can reach out to drive towards collaboratively finding a use case to develop a mesh data product to serve that use case. Really consider what you've got and how that could be leveraged internally instead of simply waiting for requests. And it means there is collaboration from the start so consuming domains might be a better partner if you go to them. If your organization is going to encourage this, you need to find a way to incentivize/reward domains finding good use cases for their data.For Alla (and Scott), a truly under appreciated need in data is the data user experience (DUX). You should aim for consumers actually enjoying using the product. Zhamak has mentioned similar things, especially her love of the book The Design of Everyday Things. Data can be intimidating to many, how can you make it so people don't feel stupid when first working with your data product? As Alla said, look to get people quickly past "what the product is" so they can focus on what the product can do for them.Alla wrapped up by sharing her view of data-driven - it's crucial to understand the data doesn't make the decision. We need to use data to inform our decisions but at the end of the day, people still make the decisions. And use that thinking to decide what you want to measure and why. Katie Bauer mentioned this too in her episode: if you won't change your behavior no matter what the result is, why should you spend the effort to measure that?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 31, 2022 • 21min

#121 Zhamak's Corner 2 - Are You Ready for Data Mesh?

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter. Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 29, 2022 • 1h 10min

#120 Applying ML Learnings - Especially About Drift - To Data Mesh - Interview w/ Elena Samuylova

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies (info gated) here.Elena's LinkedIn: https://www.linkedin.com/in/elenasamuylova/Evidently AI on GitHub: https://github.com/evidentlyai/evidentlyEvidently AI Blog: https://evidentlyai.com/blogIn this episode, Scott interviewed Elena Samuylova, Co-Founder and CEO at the ML model monitoring company - and open source project - Evidently AI. This write-up is quite a bit different from other recent episode write-ups. Scott has added a lot of color on not just what was said but how it could apply to data and analytics work, especially for data mesh. Some key takeaways/thoughts this time specifically from Scott's point of view:A good rule of software that applies to ML and data, especially mesh data products: "If you build it, it will break." Set yourself up to react to that.Maintenance may not be "sexy" but it's probably the most crucial aspect of ML and data in general. It's very easy to create a data asset and move on. But doing the work to maintain is really treating things like a product.ML models are inherently expected to degrade. When they degrade - for a number of reasons - they must be retrained or replaced. Similarly, on the mesh data product side, we need to think about monitoring for degradation to figure out if they are still valuable or how to increase value.Data drift - changes in the information input into your model, e.g. a new prospect base - can cause a model to not perform well, especially against this new segment of prospects. That data drift detection could actually be a very useful insight to pass on as an insight - has something changed with our demographics? If so, what? When? Do we know why?Concept drift - the real world has changed so your model is not performing as expected - is a crucial concept in data and analytics too. Are we still sharing information about the things that matter? In a way that is understandable? Are we encapsulating what's happening in the real world in our mesh data products?Concept drift feels similar to semantic drift in the analytics world. So we can look to potentially take deeper learnings from how people approach and combat concept drift from ML and apply it to data mesh. How can we monitor degradation in mesh data products and prevent that degradation our data and analytics work? Historically, reports drifted further and further from reality with no intervention because the pain of change was so high. Are we fully reliant on the domain to know? Can we use software to help us detect semantic drift? Very early days on that one.ML models are designed to do one thing very well. Unfortunately, we don't have a good framework for reuse at the model level in ML. Maybe at the ML feature level?ML models have expected performance metrics. Those expectations need to be set through conversations between the business team and the ML team. Measure using KPIs. Can we use a similar approach to expectations - at least for some specific use cases - for a mesh data product?When building an ML model, you need to consider scope, business purpose, expectations, measurement against expectations, etc. Similarly, when doing any data work, you should consider the same. It is somewhat hard to measure the impact of most mesh data products but it doesn't mean you shouldn't try. What are you trying to achieve with the data product and is it meeting those expectations? Is the business need still relevant or has it changed?Regarding graceful evolution and preventing breakages due to changes in sources or downstream breakages from changing the ML model and/or its outputs, ML unfortunately does not have any answers that we aren't already using on the data and analytics side. Good communication, contracts, monitoring/observability, etc. No silver bullet or MLMFD - ML Magic Fairy Dust.The concept of a feature in ML - a smaller component of the model that might be reusable across multiple models - could be interesting to consider in data mesh. It would likely break with Zhamak's view of each data product owning it's own transformation logic but could create almost proto-transformed data. Almost like a service bus to easily serve data products. Probably has lots of drawbacks but interesting to consider.Guardrails on ML models help to keep the models from doing things like reacting to data that is out of the norm. As Elena said, if an ML-based recommendation on a website is a bit off, the conversion rate falls but that's not the end of the world. But what if you are dealing with big dollar decisions? Should we look to proactively put in guardrails into our data products? Probably yes, if they are driving crucial decisions - consider failure modes and what to do in those cases.Getting to fast incremental value is crucial when developing ML models. There needs to be very good trust and communication so people understand the initial quality level might not be great as you iterate towards a better model - or mesh data product. This is becoming a common theme - how can you release a v0.1 or v0.0.1 of your mesh data product and still drive value now while getting it to v1.0?Elena started by sharing a basic definition of the concept of drift in ML. Drift causes model degradation -so the model is not as effective as expected - and can be generally split into data drift or concept drift. Data drift is typically something about the source of data you're using for your model has changed. That doesn't mean using a new source, more like you are interacting with a different set of prospects or customers than you were previously so your predictions as to their behavior are going to be wrong - you built a model to react to a different set of people. Concept drift is more aligned - at a very high level - with semantic drift in data and analytics - it is that some aspect of the real world has changed. If you look at spending habits, especially in ecommerce, between February 2020 and April 2020, as the global pandemic started to take off, the real world changed a LOT. That was an extreme example but the real world is ever changing - how can we make sure we are still measuring and sharing the most meaningful information in our mesh data products?A very important aspect of ML model drift, per Elena, is that it is entirely expected. Drift, and it's resultant model degradation, is part of ML model reality. There is a cost of dealing with drift but when an ML model is negatively impacted, it is no longer making optimal decisions. So when you detect said drift to a certain degree, you would retrain the model, or shut it down, or replace it with a new model. It's also hard to say how long a model will be in production before it seriously degrades. Or what is the degradation threshold when you should retrain or replace. Similarly in data mesh we need to think about how we evolve our data products to prevent degradation. ML models are purpose-built to do one thing but start to degrade over time. Often in data and analytics, we've used data assets the same way - we kept using the same reports as they degrade but don't replace or evolve them. We need to do better in data mesh.According to Elena, good ML practice means each model is designed to do one thing very well, not to do many things instead. There is sometimes misuse of ML models in organizations as people try to make use of the same model for multiple use cases. This is similar to the way a number of people use data assets - created to answer one question but leveraged to try to answer another. If there isn't a good understanding of exactly what the data asset addresses and how, it often leads to bad/incorrect conclusions on answering other questions.So, how do we measure if an ML model has degraded? And how do we fix it if it has? Per Elena, you should measure your model against a certain set of expectations, typically via KPIs. If the model is no longer hitting expectations, it has likely degraded. Then you would look to retrain it - use the same steps as before to train your model but with the most recent data - or replace it.Elena believes the most important aspect of building an ML model is communication first. What are you trying to actually do? What is the business reason for creating a model? When the model is created, what are reasonable - and how can we stay away from unreasonable - expectations? What are the business metrics to create the KPIs around? How will you track performance against expectations/KPIs? These same types of questions can be applied to a mesh data product. Why are you creating the data product? What is the target use case and what is the expectation for the use case? Is the use case meeting expectations? If not, is that because of the data product or the use case itself?Scott asked Elena about graceful evolution of ML models - how can we set ourselves up to deal with upstream changes more easily and how can we manage to not break things for downstream consumers. Her answer was unsurprisingly familiar: lots of good communication, using (data) contracts, using monitoring/observability tooling, set guardrails, etc. Similar to the data mesh concept, Elena believes you should really think of each model like a product.When asked how the ML and analytics sides of the house can better collaborate, Elena hopes that in many organizations, they aren't overly separate. Embedded ML engineers are similar to the embedded data and analytics capabilities/teams model many are using with data mesh. And she hopes once the super fast pace evolution of data stacks slows down, maybe both sides can start to consider using the same tooling. But the biggest driver will be good communication - it pretty much always comes down to communication...As mentioned earlier, Elena strongly believes you should not try to use the same model for multiple purposes. But in ML, there is the concept of a feature. Essentially, it is a part of the model that might be used for multiple different models, producing a subset of the model's data input. So a feature could be reused across many ML models. This feature concept might be interesting to explore in a kind of proto mesh data product way. That way, we prevent multiple data products from doing the same work. An ideal way to prevent this reuse is communication - as Omar Khawaja discussed in his episode - but a company-wide source data catalog or repository could be a way to ensure everyone knows what data is being transformed and who owns it. Especially when new data products are in development so there is a much smaller chance of teams doing the same work.We can learn a lot from ML monitoring/observability, per Elena. In ML, you need to monitor the overall ongoing quality of data ingested into the model, the quality of the output of the model, and also the quality at the point of ingestion. Often, that last part - quality at point of consumption - is managed by guardrails. If data is not within a certain specification, it is not passed into the model. Or the model doesn't react outside of certain bounds. Or if some metrics about the model relative to historical norms are very off, the model essentially gets paused and there is a failover to a less rigorous solution. Sometimes, passing bad data into the model is not the worst outcome - your Amazon recommendation is for buying another toilet seat... how many of these do you think I want Amazon? But ML models can power very big-dollar decisions. And these guardrails could be very useful in data mesh if you are driving decisions with a fast turnaround. Alert that something is unique and see is there a new normal or was there something funky upstream.Elena mentioned ML development sometimes has a perfectionist issue: people try to get to a perfect model before deploying something instead of getting to fast value - putting something into production that incrementally increases value quickly - while you improve your model through quick iteration and tight feedback loops. This is becoming a very common theme in many interviews: how do we get to incremental value very quickly while we improve the long-term mesh data product through fast iteration?Elena wrapped up on two thoughts: 1) Maintenance of your ML models isn't "sexy" but it's probably the most important aspect. Maintenance is proactive maintenance, monitoring, setting up good feedback channels, communication in general, etc. It's not just the model in a vacuum - is it having the impact you expect. And be prepared to pay for the maintenance. Which plays into 2) "if you build it, it will break." Set yourself up to detect the issues and make sure you budget people's time to keep things running and fix it when it breaks. And don't be surprised when it breaks.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 28, 2022 • 25min

Weekly Episode Summaries and Programming Notes – Week of August 28, 2022

Aug 26, 2022 • 1h 23min

#119 Cautionary Learnings From a Startup Doing Data Mesh: Orfium's Journey to Decentralized Data Success - Interview w/ Argyris Argyrou and Konstantinos Siaterlis

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Argyris Argyrou's LinkedIn: https://www.linkedin.com/in/argyrisargyrou/Konstantinos "Kostas" Siaterlis' LinkedIn: https://www.linkedin.com/in/siaterliskonstantinos/In this episode, Scott interviewed Argyris Argyrou, Head of Data, and Konstantinos "Kostas" Siaterlis, Director of Big Data at Orfium. There is a ton of useful information on anti-patterns, what is going well now, advice, etc. in this one.From here forward in this write-up, A&K will refer to Argyris and Kostas rather than trying to specifically call out who said which part in most cases.Some key takeaways/thoughts from A&K's points of view:On a data mesh journey: "It's not a sprint, it's a marathon." Pace yourself. It's okay to go at your own pace, don't worry about what other people are doing with data mesh, do what's right for you.Really focusing on the why and showing people results was a far better driver to buy-in and participation than any amount of selling about data mesh as a practice. Calling it data mesh when trying to explain it to people outside the data team didn't go well either...Orfium's "Data Doctor" approach - a low friction and low pressure office hours for a general staff data engineer - has really helped people help with data challenges and in spreading good data practices but without the "Doctor" becoming a bottleneck.The Data Doctor's role is to answer questions and provide guidance but not do the work for people. Then, take what was discussed and the best practice and document it for others to learn from - providing good leverage for scaling best data practices.In a smaller company like Orfium (~250 people), it's hard to justify a lot of full-time heads to implement data mesh. And trying to treat a data mesh implementation like a side-project also creates issues. There isn't a great answer here on exactly what to do except possibly take things slower than most startups are used to. Your data will still be waiting for you a few months later.If you are having difficulty driving broad buy-in, showing people what data mesh can do in action really helped at Orfium. Once they saw the approach delivering value, they wanted to participate.When trying to drive buy-in, specifically talking about data mesh didn't work well with non data folks. It's very easy to get confused around data mesh for data folks - just imagine it for non data folks.Trying to use Zhamak's articles as the optimal early state - where you need to be just to get moving - requires far too much work. Get to a place where you can try, learn, iterate, and repeat on your way to driving value. It's a journey!It's probably not a great idea for your first use case to be your most advanced or complicated - you will build your platform to focus on serving those needs instead of general affordances. Jen Tedrow's episode covers this quite nicely.Really assess how much additional work your data products will be for a data product owner. For Orfium, it was something to add to the existing product managers' plates as it wasn't a huge incremental burden just yet.Consider splitting your mesh data product ownership between business context ownership and technical ownership.It's okay to head down the data mesh path while learning what domains really mean. Orfium was not doing Domain Driven Design in any sense before starting to do decentralized data.Don't try to get all your teams to start developing data products at the same time. It is very hard to work with and upskill that many teams, even in a smaller organization.A&K recommend to start slower than you probably want on building out your platform; and start small in general, don't take on the biggest challenges or too many challenges at once.It's okay to have a very high-level concept of a mesh data product. For Orfium, that is: it's a product that is designed like any other software product - it solves problems for customers and is delivered via API. That helps the non technical people understand what they are delivering."Really think about what you are doing and why. Why complicate it more than that?" - said about data products but pretty universal in data mesh. It's easy to overcomplicate...About 2.5 years ago when Argyris joined the company, Orfium - a growing startup in the music industry royalty business - was really starting to see a big uptick in data requirements to serve their customers and offer new features/capabilities. They had 3-4 people doing the data engineering and data science work and another centralized BI team but the need for more advanced ML and AI was becoming clear. The centralized data capabilities were becoming a bottleneck so either they'd need to significantly scale their number of people in centralized data functions or look to decentralize in some way. They decided to try data mesh because they were feeling the exact challenges Zhamak so clearly laid out in her articles.For Kostas, he was really bought in to data mesh's self-serve aspect. Pretty much all the data in the company for analytics was flowing through him and his small team and that was understandably draining. Data mesh is more of a cultural approach than a technical one. He said, "I will build something interesting either way" so the loss of certain data engineering tasks meant he could focus on building the platform, which was just as or more interesting.At first, for A&K, data mesh was definitely a bumpy road - they tried to do data mesh as a side project while keeping up with everything else they were doing. They couldn't justify treating the data mesh implementation as a main focus at the expense of the many things in production. They didn't have the spare headcount to do it either. As they built the first version of the data platform, they were having difficulty explaining data mesh to non data folks. So, the team most bought in and willing to try using a data mesh approach was the central data engineering team. Thus, their first version of the platform was advanced data tooling for data intensive use cases with the central data team as the main users. It worked well for the central data team but once they tried to get other users, the platform wasn't really built for people who weren't highly data literate with intensive use cases so driving buy-in with other teams was hard.On trying to drive buy-in, A&K worked to explain what they were doing and what data mesh was internally. They started from the why, why would this matter and that got the exec team excited. But when they tried to sell data mesh to the engineering managers and engineers, it fell flat. The non data folks didn't really understand the nuances of data mesh - and they didn't really need to in most cases to participate and benefit from the data mesh implementation. It's very easy to overwhelm people with all the aspects of data mesh instead of what matters.What ended up driving buy-in quite well, per A&K, was seeing the output of treating data like a product in action. Once there were tangible benefits and people could see what value a mesh data product could deliver, they were much happier to move forward in participating in the data mesh implementation. Per A&K, a few of their missteps - so you can avoid doing the same - were: 1) trying to sell data mesh by the principles and using the phrase "data mesh" instead of the why and what changes for who they were talking to; 2) building the platform to serve the most data intensive use cases owned by data engineers so it was quite hard and not really suitable for others to use; 3) trying to get to advanced maturity in all parts of the data mesh implementation up front - e.g. they don't have fully automated access control but it's not really a big pain point; 4) thinking that Zhamak's articles or book are the blueprint for where you have to be at early stages instead of an inspirational goal multiple years into a data mesh journey; and 5) trying to get all teams to move together at the same time with data mesh instead of working team by team or mesh data product by mesh data product.So, what _did_ work for A&K at Orfium in their data mesh journey? Again 1) showing people the results and the value the approach could deliver for a use case; 2) not hiring new people into roles - when they looked at the additional workload, with good support and upskilling/partnering, the domains could handle it as they were - per Scott, this tends to be the case in smaller companies with smaller domains doing decentralized data; 3) focusing on the why: why does this matter? If we get this right, what will that get us? And 4) using data mesh as an enabler to change people's hearts and minds about owning and using data; it's now more a core part of teams' responsibilities and they are taking it seriously.What does good look like for Orfium in their data mesh journey right now? Per A&K, their teams understand the difference between operational and analytical data and are starting to manage their analytical data as a product. It has changed the role of data engineering and how people perceive the central data team internally to being the enabler, not the team that does the work for them. Their ML/AI teams are able to get quality data reliably so they can build out new use cases. They still don't have fully automated access control but focused on making it far easier to request and grant access and that's a good enough point for them at the moment. They are at 10-15 products - hard to say exactly what constitutes a product, hence the range - with far happier data consumers.Orfium wasn't doing Domain Driven Design (DDD) on the operational or data side of the house prior to starting on their data mesh journey. And A&K think it's totally viable to not be doing DDD at all before starting. Other guests on episodes more centered on DDD for data have said similar things. This is permission to move forward.A&K gave some insight to how attitudes and understanding relative to data have changed at Orfium. Previously, someone would want to do some analytical work against a team's data and the product manager would get them some kind of DB access to the data, so they could only access the data as it was stored for the operational system. There wasn't a clear separation between operational data and analytical data. So a lot of the evolution was just getting them to understand how their data might be used for analytics and produce and own data to be used for analytics. Far easier said than done but still achievable. And what also really helped was splitting business and technical analytical data ownership.So, with all this learning behind them, what are a few bits of advice from A&K? Start slower on the technology build out; it's exciting to build cool stuff but that can wait - what do you need to get to the low-hanging fruit value. Start small, don't try to have all the domains (or if you are not doing DDD yet, your teams) move all at once. Don't mention the phrase data mesh to people outside the data team - it typically just generates confusion - speak to the value of the approach and not the technology or what it changes for the data team - what does it change for them? When working with teams to understand the concept of a mesh data product, it's easy to overcomplicate things per A&K. For Orfium, there is a technical definition of a mesh data product and a business one. The business definition is quite simple: it's a product designed like any other software product - it needs to solve a problem for customers and is delivered via API. Really think about what you are doing and why, "why complicate it more than that?" The data products create a platform of information for teams to build data-informed applications on top of. Oh, and don't forget really good documentation for your data products.Orfium has a really interesting concept they are using internally: "The Data Doctor". Essentially, people go with their "data symptoms" and the data doctor gives them a "prescription" - advice on how to address their challenge using best practices. It's a low pressure way to have something like a staff data engineer hold office hours to help people with their data challenges to ensure people follow best practices but also learn how - and have the confidence - to implement the recommendations themselves. And then, the Data Doctor needs to work with whoever implemented their advice to document the process and put it in a central repository so others can easily follow the same practices if they hit the same or a similar challenge.Per A&K, on your data mesh journey: "It's not a sprint, it's a marathon."Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 24, 2022 • 14min

#118 - Zhamak's Corner 1 - Is Data Mesh Right For You?

Aug 23, 2022 • 10min

#117 Data Mesh and Fight Club - How Should We Discuss Data Mesh Internally - Mesh Musings 27

Aug 22, 2022 • 58min

#116 A Startup's Early Journey Towards Decentralizing Data - Iterable's Analytics Evolution - Interview w/ Riya Singh

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Riya's LinkedIn: https://www.linkedin.com/in/riyasingh1/In this episode, Scott interviewed Riya Singh, Business Insights Manager at Iterable.Some key takeaways/thoughts from Riya's point of view:~4 years ago, Iterable was in essentially "spreadsheet hell" with lots of manual data work and no standard way of storing or sharing data across domains. While domains had good data capabilities, the integration and coordination between domains was very difficult at best.Most exec questions can't be answered by the data from a single domain so cross domain data integration became a key factor in Iterable continuing to grow. How could they make crucial decisions informed by data if there was so much manual work to try to integrate ad hoc? Could they really trust something done manually each time?Fast time to market for simple, base level capabilities of their data platform was much more valuable than trying to nail every feature upfront. Data consumers understood it wasn't perfect data at the start but it led to much faster exploratory data initiatives which led to valuable insights sooner.You might have a much higher ROI buying tools than trying to really get by on low-cost but not feature-rich tools. If you build a very cost-efficient data platform that no one wants to use, is that actually valuable? How much time will you spend managing the tools or is it worth it to outsource that to a vendor?Combining data across sales, marketing, and product meant Iterable could tailor marketing messages and find better prospects, measure marketing return on investment (ROI), and cost optimize their operations and product among many other new insights. As teams that previously weren't directly interacting start to have more conversations, gaps in your data - whether in data created/collected or data shared - will emerge. Filling those gaps will mean you can answer more high-value questions to drive the business forward.At Iterable, when there is a specific use-case identified for cross-domain data integration, the central data team takes over ownership of what would be considered a consumer-aligned data set in data mesh terms. With only 4-5 domains, Iterable doesn't need to decentralize the data team yet. The cost of decentralizing is far greater than the benefit right now.Iterable found the most value by doing exploratory data analysis then quickly moving to minimum viable consumable form. Then, they work to continue to improve the data set. This approach means a fast time value by grabbing the low hanging fruit while continually driving to better data and incremental value. But to do this, consumers must be very aware of what they are getting when :)A key way to keep stakeholders informed and bought in is by constantly keeping them updated on progress - Jen Tedrow talked about this in her episode. Keep people informed of progress or ongoing investigations so you can stay coordinated and all parties understand decisions along the way.At Iterable, conversations between domains are happening weekly. That way, there is always space for people to keep each other updated on upcoming changes or new information. It makes it easy to keep each other informed. Far harder for organizations with many domains but still useful general advice - much like in any good relationship, schedule the time to exchange context!While Iterable is too early in its life to fully look to implement data mesh, they are taking some major inspirations in how to enable domains to appropriately share data with each other, per Riya. When Riya joined, they were under 100 people and had a very fractured, one-off analytics approach built on a lot of disparate Excel spreadsheets. It was very hard to make data-driven decisions even within many of the domains but especially on cross-domain questions - which are often the most crucial to companies. Each domain had a relatively high data maturity, enough so that the centralized BI/data team thought they could and should still own at least some of the responsibilities to their data. But, per Riya, to move from a fractured environment, they needed to at least consolidate sharing information into a single place or tool and to start to standardize how information might be shared and combined/integrated across domains. An example Riya gave was how the Product Team data was stored in such a different way than the sales and marketing information but Sales and Marketing both wanted to consume metrics from how customers and prospects were using the service. But again, integration across domains was extremely hard to do. They also needed a standardized ownership model across domains - what would the domains own and what would the BI/data team own?So, what was the driving factor to push Iterable out of essentially the dreaded "spreadsheet hell"? Riya mentioned that the executives were not able to easily and repeatably/reliably get answers to their questions in a timely manner. Many crucial, exec-level questions are not domain specific and require data from multiple domains. And most exec questions aren't a single point-in-time question so you need to set up reliable processes to support answering those questions now and into the future. So exec pain at not being able to quickly make decisions backed by data - especially data they could trust - meant there needed to be a change.Riya gave a few specific examples of big questions they couldn't answer in the fractured setup. 1) What makes for a good target prospect? You need to combine sales and marketing data about company type, size, etc. with purchase history and combine that with the product usage data to see what types of companies were actually using which features. And then combining in the renewal rate and how much each customer expanded. It could also give them an ability to target pitches based on which features were used by which type of companies. 2) What is the actual ROI on marketing spend? Scott, coming from FP&A, agreed this is an insanely difficult question without really good data. But it's notoriously difficult without really clean data. 3) What features or business programs should we kill or reduce investment? To cost optimize, you need clean data on what is actually happening with the business and product down to quite granular levels so you aren't making crucial decisions based on gut.Per Riya, one very important outcome of the work to combine data across domains: identifying additional gaps in their data and analytics. Were they collecting the information to answer these new questions? If the domain had the information, could they share it? Once basic questions were more or less answered, they could see where they could do better on what information they collected and shared to drive deeper, high-value insights.When starting to build out the cohesive, company-wide data platform, Riya and team looked at a number of tools. She said they made a few good choices and a few things that she would do differently. They focused a bit too much on trying to provide a really simple UI/UX instead of just getting data sharing and analyzing capabilities in people's hands and then working to improve from there. They eventually saw really big value in getting people doing initial exploratory work quickly - reducing the time to market of people being able to use even base-level features was very valuable. They also went with something that made ETL far harder to manage than it should have been in Riya's view. Sometimes those more expensive offerings like Fivetran will have a much better return on investment - ask yourself how much of your time will be spent managing a system instead of value-add work and if that would be better spent on buying a tool. As Doron Porat mentioned in her episode, it's rarely a super easy choice but too often, people opt to try to roll their own when it's more valuable to focus on higher value-add work.They are now using Snowflake, Fivetran, Looker, and dbt.Iterable saw a lot of value again from the initial exploratory work between teams driving insights - once they found a good use case, the teams came to Riya and team to build out the data models that could easily combine that data between the domains that would allow for high quality, trustable data in a format that was easy to use. Essentially, once there was a clear use case, the central data team took over ownership to ensure and maintain quality. As seen in many previous interviews, this central ownership model scales until it doesn't. Right now for Iterable, centralized ownership of what might be considered consumer-aligned datasets in data mesh makes sense. Data mesh, as Zhamak has envisioned it, is not for all companies and a hybrid ownership model like Iterable is using can scale quite well for organizations with not that many domains.So v0.1-1.0 of Iterable's data platform was Snowflake + Stitch + Looker, then v2.0 was Snowflake + Fivetran + Looker + dbt. Riya and team are starting to work on v3.0 of their data platform to support some initial data science / ML use cases. What they've found is far too much of the central data team's time is spent on manual tasks so they will be focusing on these new ML use cases as well as building in more automation and optimization.A few interesting things from Iterable's approach: 1) by starting with exploratory analysis first, they could discover low hanging fruit insights while working to elevate the data set to production quality. Some things were just obvious in the data even before it was high-quality. Getting to an early initial consumable form and iterating towards higher quality drove value sooner; 2) the domains are constantly in communication with weekly check-ins. This gives them scheduled time to keep each other informed. Easier to do in a company with 4-5 domains but it means fewer surprises and more high-value collaboration; and 3) something that has worked well for Riya's team is constantly keeping stakeholders updated as work around data progresses. By keeping people in the loop, there is a tighter feedback cycle if expectations aren't aligned or diverge, meaning far less chance of wasted work.Riya wrapped up by mentioning how crucial high context conversations really are to making your data strategy work no matter your data management approach. If the domains were just trying to drive their own data, the overall company would be flying blind. So find good ways to keep each other informed and exchanging context!Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Aug 21, 2022 • 36min

Weekly Episode Summaries and Programming Notes – Week of August 21, 2022

Aug 19, 2022 • 1h 17min

#115 Understanding the Data Value Chain - Your Key to Deriving Value from Data - Interview w/ Marisa Fish

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereMarisa's LinkedIn: https://www.linkedin.com/in/marisafish/Obeya method: https://obeya-association.com/what-is-an-obeya/MIT course on "The Science of Intelligence": https://cbmm.mit.edu/education/courses/science-intelligenceJohn Duncan paper on brains executing series of programs: https://web.mit.edu/9.s915/www/classes/duncan.pdfIn this episode, Scott interviewed Marisa Fish, Director of Information Management at American National Bank. To be clear, Marisa was only representing her own views on the episode.Some key takeaways/thoughts from Marisa's point of view:Understanding your data value supply chain - the way you derive and deliver value from your data - should be the crux of data and analytics work. The data value supply chain breaks down into sharing the data itself, sharing analytical insights about the data, and managing the data. All three are crucial to creating value from your data.Intentionality is crucial - instead of being reactive, stop and ask what are we trying to accomplish and what value will it drive. Then you will focus much more on high value-impact work.Similarly, think about system engineering work as "mission engineering" - what is your mission in doing your work? Does the work you are prioritizing serve the mission?When sharing information, start from: what is the point, what am I trying to drive with this information exchange? Are you trying to share one person's way of thinking or insights or give others the capability to derive their own insights from the new information? Both are very valid and useful but it's easy to talk past each other if you're not on the same page.So much of the way most organizations work with data is about the known knowns - the data consumer knows what data they want and what questions they want to answer with the data. We need to enable people with questions to find the right data to address them and people to also do data spelunking with data they aren't sure what it might tell them. Look to the Library and Information Sciences space for how to approach that.We need data librarians, not data publishers. Data publishers are about putting data on the shelf and serving only the known knowns. Data Librarians are there to help people find the information they need to address more of the unknowns - the value of curiosity in driving incremental valuable insights.There is a major mismatch in most organizations between what insights the business units are producing and the key questions the C-level execs care about. Consider creating a Chief Data Analyst type role to pair with execs to make sure insights are produced to support their initiatives, not just answer their questions as they come up. Think ahead, build ahead.Data teams need to take far more practices from general engineering - not just software engineering - so we learn how to better understand requirements.When requirement gathering, expecting the data consumers to know all of their requirements upfront can lead to data consumers asking for the world and a bad mismatch between asks and needs. Look to new ways to exchange information about requirements including the Japanese Obeya technique. Spend the time to ensure you understand how data consumers will derive value from the information you will share with them. That will give you a better understanding of how best to serve them and what stated requirements might not be quite so required.Anytime you are sharing data, it's easy to get bogged down in the 1s and 0s. Ask yourself: what information am I trying to share and why? How is this driving value?It's very important for data producers to really dig into use cases to prioritize the work but also to make sure you aren't over-optimizing or under-delivering on the value. What is the point of the work? And what do the data consumers want: analysis/insights or data?Data producers/owners are often not willing to openly share all of their data. A big reason is compliance - with internal policies, regulations, etc. So a high context exchange of how a data consumer will actually use the data can lead to more data openly shared - the producer can be assured there won't be non-compliant use.Marisa started the conversation sharing that she is hard of hearing and how that has strongly impacted the way she interacts with the world. With often missing certain words in conversations, she - and her brain's linguistic processing - have to work in a far different way to fully comprehend the meaning of what is being discussed. And we need to think about how we can use a similar approach to data - we won't always have all the context, how can we apply neuro-linguistic approaches and human data processing frameworks and apply them to data to better expand our understanding of the data?For Marisa, any time you are sharing information in the form of data, it's important to understand that it isn't just machines communicating with each other. In all forms of sharing information and knowledge, you should ask: am I trying to mimic and share one person's way of thinking or am I trying to augment the way of thinking of the audience? In other words, am I trying to share one person's understanding or am I trying to give someone else the information to create insights and deepen their own understanding? Both are very valid and helpful but really focus on: what is the actual goal of the information exchange?Marisa recently moved from a 25+ year career in the US Department of Defense into the financial sector. So she is learning a completely new "language" - actually several. The terminology, the business terms, the ways of exchanging information, the way information and requests flow, etc. From her years of working with very high-impact information exchanges in the DoD - intel drives foreign policy and can put many lives at risk - she understands the cost of data producers and consumers not aligning. So she recommends to start from a conversation of what is the point of this data request or work. And as Jean-Michel Couer said - not in a combative way but it's crucial to ask.It's really crucial to dig into the use case and business need for a few reasons. That context exchange is crucial to driving any data initiative forward including prioritization. The data producers asking the consumers what they are trying to achieve means they have the context to better serve their needs instead of the data consumers having to know every requirement upfront - they have far too many known unknowns about the data. And it also ensures there is a business reason for the ask - how are you, the data consumer, going to derive value for the business from what we plan to share? If it isn't going to drive value, is there a benefit to doing the work?Marisa and Scott discussed how difficult it can be to openly share data internally without really knowing what the downstream use cases are. As past guests have noted, without understanding the exact use, domains will not share as much of the data in most cases. Why? Because it's very easy to get into non-compliant use. And it's also easy for people to misinterpret and misuse data if you don't give them the context to truly understand what it means. So to have a more open sharing environment, especially in an industry or organization where data policies are stringent, sharing the context with each other is crucial. Data consumers must share about target use and how they will prevent misuse.The data value supply chain - which is the core way to think about how you derive and deliver value from data in Marisa's view - breaks down into three parts: the data itself, the data insights/analytics, and the management of the data. If we look at data mesh with this lens, we are asking domains to at least take on sharing the data and the management of the data - with a lot of help from the platform. But as many guests have noted in the past, data mesh practitioners are somewhat split on how far the mesh data product extends - do you want to prepackage the insights for consumers or package the information up so they can derive their own insights? I think we are coming to an early pattern of it's both, where possible and valuable. Look to Xavier Gumara Rigol's episode for an in-depth dive into this.Marisa shared about her work in the Library and Information Sciences world and how it applies to data - when it comes to exchanging data, so much of what we do currently is about the "known knowns". The consumer knows what data they want and what questions they want to try to answer. But that is only one of the four quadrants of information. While it can be truly difficult to grasp the unknown unknowns - not knowing what information sources or data you want and what questions you want to answer - the known unknowns and unknown knowns are crucial to expanding our understanding of what is occurring with our organization. Known unknowns is I have questions but not sure what data can help me answer them and unknown knowns is I know I want to analyze and leverage this data but what can it help me answer or what is it telling me.There is a big difference between a publisher of data and a data librarian in Marisa's view. And you should look for the data librarians. A publisher of data is someone who just provides the 1s and 0s and not the real information. A data librarian is someone who facilitates finding information. So how can we do that at the organizational level? It's difficult enough to get people to be that data librarian for their own domain, how can we do that across domains? Does that need to be at the platform or human level? Is it we want both to overlap and serve as much as we can?Far too many organizations treat the data team as a service-based model, per Marisa. This has been a common theme across many articles over the last few years. A service-based model makes you likely to be run as a cost-center instead of a profit center. So how do you switch that perception? Part of that is doing work that is directly tied to the data value supply chain. For every bit of work, ask: how will value be derived?But, on moving away from a service-based model, Marisa and Scott agree that there still needs to be someone pairing with C-level executives to make sure there is information gathered and collected to support their key initiatives. A kind of Chief Data Analyst. Rather than ad hoc questions and "I'll get the information for you", it's someone aware of the key initiatives to the company that is ensuring insights are being manufactured to support those initiatives. There is a major mismatch in most organizations between what insights the business units are producing and the key questions the C-level execs care about. For Marisa, to identify what to do to satisfy a data consumer's needs, you need to really understand their way of working. Do the operational business process mining to figure out how to best serve them as you work to build out the use case. Marisa shared her feelings about getting overly focused on the small picture - the micro - and how that plays into the big picture - the macro. If you spend too much time focusing on individual use cases, especially gathering requirements, will you be able to scale up sharing data internally? Possibly look to the Japanese requirements gathering technique called Obeya - or open room. A data producer can lay out a considerable number of possible requirements and the data consumer can opt in to those requirements. And then the data consumer can avoid some of the unknown knowns - what don't they know about the data? Then, there can be an open exchange about those requirements between data producer and consumer.In data, so much of the work most organizations do is a direct reaction to a request or question instead of focusing on "how are we going to derive value from this work?", per Marisa. We need to take practices from engineering - not just software engineering - on requirement gathering and understanding. How many people in the data space have really trained in proper requirement gathering? Is this mismatch between gathered asks and what the project actually needs to succeed the reason why so many data initiatives fail? Stop simply reacting to requests, ask why this matters, what value will it drive. Scott's favorite data mesh word: intentionality. Think of it as "mission engineering" - why are you doing this? What is our mission and does this support our mission?Per Marisa, when we as humans collect information ourselves - not in our systems - we mostly do that through hearing and vision. So, how do we think about our electronic information collection systems? Can we get our systems to better mimic the way humans collect and process information? Should we try to mimic them exactly since humans have inherent bias? What can we learn from the way humans collect and process information and then improve upon those? What is the science of intelligence and how can that impact the way we build systems? Our brains execute a series of programs for information processing. How can we leverage multiple frameworks to do the same?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app