Data Mesh Radio

Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh. Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out! Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above. If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/ You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/ Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio

Latest episodes

Feb 3, 2023 • 1h 17min

#189 Our Data is In the Cloud… Now What? - Interview w/ Vikas Kumar

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Vikas' LinkedIn: https://www.linkedin.com/in/vksnov9/Vikas' Twitter: @vikaskumar9 / https://twitter.com/vikaskumar9Vikas' email: vikaskumar9 [at] gmailIn this episode, Scott interviewed Vikas Kumar, AVP and Head of Data, AI, and ML at CNA Insurance. To be clear, he was only representing his own views in this episode.Some key takeaways/thoughts from Vikas' point of view:In data mesh, make sure to keep focused on bringing the business domains along. You aren't building for the sake of building. If users can't derive value from the data work being done, why is it being done?The 2010s through the early 2020s have been about moving data to the cloud but we are starting to see people really leverage that data to generate value. The cloud unlocks many new possibilities around data due to flexibility, scalability, and unit economics.With moving to cloud, there is much less focus on specifically managing the data and more focus on getting value from the data. SaaS data product offerings really unlock people's time to focus on driving value.Cloud gives us the scale and data availability but there is still a long way between having the data available and leveraging the data for significant value.Cloud can be a double edged sword - it gives you flexibility and scalability but without good controls, you are likely to do a lot of duplicate work. Be careful that ease of data product creation - or at least PoC creation - doesn't create chaos and data product overlap. Make sure to have good governance here including strong communication.?Controversial?: We aren't very good yet at making it easy for business domain users to leverage data in many of their decisions. Where do we fall on the spectrum between we need to teach them how to do everything data and we need to curate everything for them??Controversial?: It's easy to focus too much on the short-term quick wins in data. You need to think about your overall data landscape and build a foundational approach so you can go after big picture, big impact bets with your data work. You should think about building every data product from a foundational approach too to make them more extensible.We need to get people out of their functional silos with business people only speaking business and data people only speaking data.To do data mesh well, we have to focus on the operating model of the organization around creating and maintaining data products. There is too much focus on the technical aspects instead of how does this actually get done in a way that fits with the organization's ways of working.Data producers must assess data consumers' data fluency levels. If they aren't very strong with data, should you really be delivering them raw data instead of curated insights?For any data product, you should start by mapping it to a target outcome. But it shouldn't stop there because with reuse, new outcomes may emerge that drive additional value. Data product owners are crucial to building good data products. It's their job to identify and then satisfy the objective of building the data product. What are you trying to achieve?We shouldn't focus only on the data product - the work to create that data product is what makes it valuable, the data product is merely the vehicle for delivering the value, the output of real product work around data.?Controversial?: Many companies doing data mesh appear to be trying to leave data governance until 'later' and that is likely to bite them. The governance meaning the security/access control but also the interoperability. You might not need to implement all of your data governance upfront but you should plan out your general governance strategy very early in a data mesh journey.Access control is a really hard problem. Many organizations don't have good communication or visibility into who is using what data and especially how/why. We need to be asking these questions and then setting access policies that expire too - we should check in to see if people still need access, that's just good governance. According to Vikas, 2010 through the early 2020s the focus has been on moving the data to the cloud to better drive value. And now that more and more of our data is in the cloud, we are starting to see much broader adoption of things like ML and AI. The cloud gives us the promised but under-delivered scalability of the "big data" technologies along with the flexibility to move quickly and experiment. Cloud can also mean it's easier to bring non-data people into the mix to drive better collaboration between the data people and the business people/domain. So cloud gives us this massive scale and data availability but we still have to learn to better leverage our data, drive value from it - we are still in pretty early days there as an industry.A big outcome of the mass movement of data to the cloud is how much time is spent on data management versus getting value from the data according to Vikas. DBAs used to spend 60%+ of their time just managing the data but data people's time is now focused on getting value and probably only 10-20% is spent managing the data specifically. But cloud can be a double-edged sword too - if it's very easy to create new data products or beta data products, you have to be very careful to not create overlap/duplicate work/data products. It all comes down to governance and your operating processes to prevent that.As an industry, we are getting much better at serving data reliably at scale according to Vikas but we still struggle with the gap between the data is available and the data is able to be used by consumers in the business domains. We are still working on figuring out where to meet in the middle between handing people reports and maybe dashboards - a kind of old school approach - versus upskilling them to very high data fluency so they can build everything themselves. When asked that question - do the data people have to learn all the business context or vice versa - Vikas gave the very data mesh answer of "it depends." But that makes sense because there shouldn't be a single prescribed method, you have to look at how your organization works and fit with that model. And you probably want to meet somewhere around the middle. Otherwise, you will cause unnecessary friction. So look to your general ways of working, cross train people, get people exchanging context about what they are trying to achieve and instill a culture of feedback and collaboration. That's how you can actually execute well on a data mesh strategy.Vikas talked about your data strategy north star being about getting value from your data, reliably and at scale. So, you need to be realistic about where you are in that capability journey right now. As a data producer, you need to assess can your data consumers do everything necessary if you give them raw data or should you be curating it for them so they can actually leverage the insights. Work to find the high value return data work early instead of trying to do the most complicated aspects of data. It's okay to start small, no shame there.A data product should always map to a target business outcome according to Vikas. But that shouldn't be the only factor. The reason for creating a data product should be trying to achieve that outcome so use that as the north start for the data product but we must build in a way where data products can be reused - sometimes with some additional work - for additional use cases. And it's really crucial to have a data product owner that is discovering and focusing on the objective of the data product. How can you provide the business meaningful data that meets their objectives, that should be a key objective of every data product.When asked how do we balance focusing on the long-term wins instead of the quick - but typically small - wins, Vikas talked about the need to create a holistic view of your data and build a very strong foundation for how you will deal with data in general. That makes it so you can jump on the quick wins when you find them but you also have a steady foundation for making much bigger bets going after long-term big wins. But with a shaky foundational layer for your data, those long-term big wins are much less likely to pay off. And that foundational aspect comes in at the data product level too - build data products that can be easily extensible when it makes sense because they are built to be extensible from the start. Kent Graziano in the recent data modeling panel railed against having to rebuild every time you extend a data product, don't do that :)For Vikas, there are many value streams for a data product - most people focus on the data set itself but it could be the governance work or the collaboration conversations between producer and consumer. We need to focus less on the data product as the exact output instead of the data product being the vehicle for delivering value but the overall product work itself significantly enhances the value of the data product.Data governance seems to be the part of data mesh that confuses a fair number of organizations so they ignore at their significant peril according to Vikas. While you might not have to build every aspect of your governance upfront, it's crucial to think about how you will apply governance. And to truly get to the ideal of a self-serve platform, governance needs to be a simple part of the ways of working. Saving that for later is not going to end well for many organizations. And while access control is hard, we need to get far better at understanding who is using what and _why_. How long should someone get access to data? Forever access should be a non-starter. And how do we make it easy to grant that expiring access?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 31, 2023 • 16min

#188 Controversial Opinion: You Must Register Your Use Case With the Data Producer - Mesh Musings 42

Jan 30, 2023 • 1h 4min

#187 Maximizing the Value of Your Data Through Data Products - Interview w/ Bruno Aziza

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Bruno's LinkedIn: https://www.linkedin.com/in/brunoaziza/Bruno's Medium: https://medium.com/@brunoazizaBruno's YouTube (Carcast videos): https://www.youtube.com/@brunoazizaIn this episode, Scott interviewed Bruno Aziza, Head of Data and Analytics at Google Cloud.Some key takeaways/thoughts from Bruno's point of view:The end goal of your data strategy should be to reliably and scalably turn data into value. The best way to do that is by creating data products. How you get there might be different but don't lose focus on turning data into value."The number one barrier to your ability to drive value of data is not your technology, it's your people and how you organize your team."Focus on the point of what you are trying to deliver, not the actual output. It's not about delivering a dashboard, it's about creating a sustainable way to explore, share, and consume information/insights, whatever form that takes.!Controversial!: There are 3 phases to getting to data driven; 1) is building a data lake or ocean, 2) is data mesh, and 3) is getting to a data product factory equivalent.It's easy to try to put the cart before the horse in data. Before doing something like data mesh, you have to think how you will develop data as a function in your organization.Understanding the data product manager role and leveraging data product managers well is crucial to building an effective data product strategy and practice. They are your data product CEOs.A CDO's effectiveness depends on if they have a true seat at the exec table - can they create the necessary change - and how many people in the organization are "devoted to the data opportunity."?Controversial?: "Really smart data leaders hire for the business department." Understanding if someone cares about data and if you can work with them is important. If that potential head of marketing hire doesn't care about data or has low data fluency, will it be possible to work together?The companies doing the best on data literacy are making data a crucial part of their culture. Their daily practices increase data fluency across the entire organization rather than in a centralized data team. It's not about just training, it's about changing habits.Data driven companies are "162% more likely to surpass their revenue goals.""We're seeing people kind of rushing into migrating and not thinking about governance."?Controversial?: There are two components that significantly increase the chances of successfully transforming to being a data driven organization: 1) a true organizational mandate to become data driven and work on data products and 2) does your organization have the attitude and aptitude to drive towards being data driven.There isn't a clear pattern yet for the best way to find your data product managers - teach the business aspects to the data people or vice versa. But it's clear that understanding what value data products drive - not just the ins and outs of the data product itself - is crucial.More and more, the centralized data team model is getting swamped in large organizations. But too many are too happy to fully decentralize, which also causes many issues. Federated - decentralized control and work but centralized collaboration and practices - is the approach seeing the most success.?Controversial?: There are 3 types of data products: 1) internal domain-focused (data on the inside, not very reusable), 2) core, centrally managed, and 3) everywhere in between :)A major issue in the organizations that focus on empowering domains without interoperability is that - surprise, surprise - there are different semantic meanings and so the data becomes very difficult to integrate/interoperate.Focusing too much on quick wins will mean you miss out on the places where data can add a lot of value. Quick wins are typically not big wins or you find the big wins fast and then all your following quick wins are moderately sized at best.Bruno started off with something you don't often hear a vendor say: "The number one barrier to your ability to drive value of data is not your technology, it's your people and how you organize your team." So while you can't buy your way to a data mesh, you also can't just flip a switch and be doing data mesh. You need to build your organization's capabilities to a degree they actually can derive value from their data.It's also not easy for a data leader to necessarily create the necessary change per Bruno's conversations with data leaders. Many don't get a true seat at the executive table. And even if they do, if there aren't enough people "devoted to the data opportunity," it will be a very hard road to drive the data function to where it can add significant value. Bruno also dove into what he's seeing that makes for a high data literacy rate at customers - changing the day to day interaction, the habits of working with data. Making data part of many more people's roles and making it an intentional part of the company practices/habits builds an incredibly deep bench of data talent across the organizations. So his three components to positive change in your data approaches as a company are a strong data leader, the proportion of people committed to data work, and daily practices involving data.While we know being data driven has an advantage - data driven companies are 162% more likely to surpass their revenue goals per a study - Bruno sees a few reasons why only 27% of companies are actually data driven now. To be data driven, you need to reliably produce data at scale, hence creating data products. And to do that, you need to build out the capabilities to handle data at scale - and not skip the governance :) But the end goal is to provide a reliable way to create value from data. That's really it. The best way to reliably do that is via data products in his view.Bruno is seeing people go through three phases in getting to a reliable, scalable way to turn data into value. Phase 1 is the data ocean - it's not a lake, that's landlocked. The second stage is data mesh, allowing people to autonomously innovate with data but relying on central resources. And the third stage is a data factory. Scott note: the factory analogy might be rough because 1) feature factories are a very bad software pattern and 2) factories are notoriously about producing the same things at scale. And while we want scalable ways of creating data products, they should be more fit for purpose to use cases (but of course reusable as well) in my view.The data product manager role is crucial to getting data products right according to Bruno. You need someone to be the CEO for you data product, that is focused on the actual value the data product drives and how reliable is the data product creation/maintenance. What more should be added to the data product? How is it used? To drive that cultural shift, you need a strong leader of the data organization that is empowered to make the right changes. For Bruno, there are two factors that significantly increase the chance of an organization successfully becoming data driven. The first is an organization-wide mandate that data matters and that people must participate in the change and leverage data. Especially if the CEO is bought in on the data opportunity and the need for more and better data for themselves especially and the organization more broadly. The other is the attitude + aptitude to actually go out and build a scalable capability to build data products. And that's far easier said than done. That can be driven centrally or in a distributed way but you need people to step up and own the data.The centralized data team model is becoming harder and harder for companies to scale according to Bruno. The team needs to be constantly ahead of the curve and they don't have the ability to learn all the necessary context so they quickly get overwhelmed by requests. This was a key factor in Zhamak creating data mesh as a concept. But the teams that are just fully decentralizing are creating data silos and making it increasingly hard to answer cross domain questions. So the organizations that are doing a federated approach with a strong sense of overall collaboration are winning - there are things that are centralized and things that are decentralized and each organization needs to figure out what works for them but balance is crucial. Find the right approach for the job.Bruno talked about those companies that are focusing more on empowering domains than on the bigger picture of how domains can also work together. No major surprise but it creates data silos because everyone has different definitions so nothing is easy to integrate/interoperate. This is leading to the rise of the idea of the universal semantic layer.Quick tidbits:Data leaders should be involved in the hiring process for business people. That way, you can start to build a relationship early and help select someone who values data and has a decent data fluency. You don't want to be left out of the process.It's absolutely okay to have domain-only data products that are very specialized to that domain - basically data on the inside in a data product. It's also - per Bruno - okay to have very centralized data products that are pretty core across the organization. But look for places to build reusable data products to get the most leverage from your data work.To do data products and data product management right, you can't only focus on the data product launch. Maintenance and growth/evolution are crucial aspects of product thinking.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 29, 2023 • 23min

Weekly Episode Summaries and Programming Notes – Week of January 29, 2023

Jan 25, 2023 • 1h 8min

#186 Zhamak's Corner 16 - An Interpretation of Zhamak's Call-to-Action

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.We missed our window to record so I am interpreting what Zhamak is saying in her Medium post about why did she creator her company and the general state of the tooling market around data mesh. I also added on a the full 50min recording of our second recording which I had broken up into episodes 4-7.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 24, 2023 • 21min

#185 How the Heck Do We Do Federated Computational Governance Part 1 - Mesh Musings 41

Jan 23, 2023 • 1h 14min

#184 Ontologies Don't Have to Be Scary: An Ontology Primer - Interview w/ Neda Abolhassani, PhD

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Neda's LinkedIn: https://www.linkedin.com/in/neda-abolhassani-ph-d-61354329/OSDU Ontology: https://github.com/Accenture/OSDU-OntologyIn this episode, Scott interviewed Neda Abolhassani PhD, R&D Manager at Accenture Labs. To be clear, she was only representing her own views in this episode.There's some very specific language about ontology in this episode but I think it's quite approachable for most people as a good understanding of ontology, the difference with taxonomies, and some specific insight into developing and applying an ontology.Some key takeaways/thoughts from Neda's point of view:When starting developing an ontology, it's best to start from the business questions you want to answer. It is okay to choose bottom up or top down, but the business applicability is the main point.You can convince people ontologies and knowledge graphs aren't scary or that hard to learn and leverage with a small demo of what they do and how to use them.Look for open ontologies that have already been created around your domain or area you are trying to model. They can usually be easily augmented and extended but there's no reason to reinvent the wheel.Data people need to learn enough about the domain to build the right ontologies and data models but data people learning domain knowledge can "discombobulate" them :) Get the data people with the subject matter experts to learn what's necessary.Try to keep your ontology as generic as possible but still encapsulate what you need; that way it is much easier to apply the ontology to other domains/departments. Set your ontology up to evolve as you learn more about your organization and as your organization itself evolves too.You don't have to change your ontology simply because there are new use cases. If you designed it generically enough, it should be able to handle most new use cases whether as is or with a few additions.A good way to measure if your ontology is good enough and is still meeting your needs is to look at the business questions you want to answer. Are you able to answer them with your current setup?It's okay to have a global ontology and then ontologies that are more specific inside the domain if that is necessary to extract more value from the domain.?Controversial?: Knowledge graphs need an ontology as well as a data model. So if you plan to leverage a knowledge graph for data mesh, you will need an ontology.When is best to actually start to deploy a knowledge graph and develop your ontology for data mesh - if you go that route - is still somewhat up in the air. The earlier the better in general but it will mean more work as things are unsettled early in your data mesh journey. Basically, it depends.Constant schema changes can make designing and updating your ontology more challenging. Whether that means you want to delay adding something immature to your ontology versus waiting until the schema settles a bit more, it's hard to say. Just, it can create additional challenges.?Controversial?: To really have high-value interoperability in a data mesh, you need a way to capture not just the metadata of the data products but also the semantic meanings in the domains. And that should be done via a knowledge graph.Ontologies are richer than taxonomies because while both have definitions, ontologies also have description logic. This gives people an ability to define how data technically fits together across different aspects.OSDU, or open subsurface data universe, is a specific open source data platform for the oil and gas subsurface data. Neda developed an ontology to go along with that.Neda started off with a definition of ontology: "So literally, an ontology is a formal explicit specification of a shared conceptualization. I know that it has lots of jargon, but I'm going to explain it to you. So it is an abstract model of concepts, properties, relationships, and it is standardized, it is machine readable. And it is not just the instance level data, it doesn't include the instance level data, but it includes the schema and the type level information and how stuff should be connected in your domain."When asked if it's best to start top down or bottom up when thinking about building an ontology, Neda said either is acceptable but the main advice is to start from the business questions you want to answer. After all, this isn't an exercise for fun, there needs to be a business purpose. And look for open ontologies for your problem statement or industry. There are a number of ontologies that have already been created that you can leverage, extend, and/or use for inspiration. There is no real reason to reinvent the wheel.When building your ontology, Neda recommends keeping it as generic as possible. That way, you can apply it to multiple domains with no conflict. But it still has to meet your needs obviously. There are ontology editors to make things easier as well but it's important to set your ontology up to evolve as your understanding of your organization evolves and as your organization itself evolves. You can even do version control of your ontologies to make collaboration far easier as multiple parties look to improve the ontology simultaneously.For Neda, ontologies and knowledge graphs go hand-in-hand. It's okay to have an ontology for the global organization and another one for a specific domain if that's of value. Ontologies are typically about communicating externally from the domain or whatever grouping you are representing. And knowledge graphs are for integrating data from different sources or domains. And for knowledge graphs, you need an ontology and a data model for it.Ontologies are richer than taxonomies because while both capture the definitions, ontologies also have description logic. That description logic gives you a better ability to define things like unions, intersections, restrictions, and equivalences. So ontologies are broader than just concepts and terms.Neda then discussed the OSDU or open subsurface data universe - an open source data platform for subsurface data in the oil and gas space. She specifically saw a gap in OSDU where companies loading their own data into the OSDU format was pretty challenging. It required a lot of subject matter expert time to match schemas to the OSDU format. So Neda and team developed a technique using a knowledge graph and AI techniques to try to automatically match and map data in a company's own schema to the OSDU format. And as stated earlier, a knowledge graph needs an ontology :)As Neda worked to build out the OSDU ontology, she looked at the OSDU canonical data format and reviewed the schemas to understand what embedded choices were made so she could ensure she added that in to the ontology. She looked at the ontologies specific to related spaces or even some that were part of the OSDU area of interest like seismic data. However, the ontologies that existed for oil and gas were mostly outdated and didn't really cover what was really useful and interesting. An aspect of OSDU that made developing the ontology easier was that the schemas were not changing very often so there wasn't a constant remapping and versioning challenge.So, circling back to data mesh, Neda believes it's important to leverage a knowledge graph to really ensure good interoperability between domains and data products. A data catalogue - or other mechanism for discovering data in data mesh - that only has information about the individual data products and not how they interconnect won't have as much value.* When should you actually start to develop and deploy your knowledge graph is again something that requires more study and feedback. All else equal, the earlier the better, but of course as things are developing and changing rapidly early in your data mesh journey, trying to _also_ update your ontology will be a ton of extra work. Time will tell.*Scott note: yup. Isn't that just high quality data silos? Even if they interconnect, if people can't easily understand and find the interconnections, you likely lose a LOT of the value of data mesh. Whether knowledge graph and ontologies are the best approach remains to be seen however.Neda covered an aspect that is really important for all things data mesh: how to measure when things are good enough versus they need updating :) for her, it's about what she said at the start: what are the business questions you are trying to answer? If you are still able to answer those well enough, you probably don't need to change your ontology. But if those questions have changed considerably and your current implementation is not able to answer those questions well, your ontology will need to be updated - maybe some new concepts will be added and some old concepts deleted. You do want to be careful to try to keep things backward compatible as you deploy a new version of your ontology. Evolving ontologies is a challenging thing but if you designed your ontology well enough at the start, you probably don't need to do it all that often according to Neda. You should design your ontology in a generic enough way so it can handle new use cases without every little new aspect needing a whole new ontology version. However, that doesn't mean your ontology should never evolve. Things change or need clarification and you should be willing to be adaptable. Scott note: this is where Zhamak sees challenges with ontologies: if they are overly centralized and overly rigid, they prevent people from expressing real meaning at the data quantum level because they are trying to fit the definitions of the data quantum into the ontology.In wrapping up, Neda shared her views on how to really get started on building out a good ontology and knowledge graph. It will require your data people to learn enough about domains from the subject matter experts to develop the ontology. Be prepared for that to be a bit confusing as sometimes learning a lot of domain knowledge can "discombobulate" your data people. And it won't be a super quick exercise. But Neda believes it will pay out in the end and add a lot of value.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 22, 2023 • 14min

Weekly Episode Summaries and Programming Notes – Week of January 22, 2023

Jan 20, 2023 • 1h 18min

#183 Business Intelligence's Place in Data Mesh - BI-gin With the End in Mind - Interview w/ Ryan Dolley

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Ryan's LinkedIn: https://www.linkedin.com/in/ryandolley/In this episode, Scott interviewed Ryan Dolley, an Independent Business Intelligence (BI) Consultant.Before we jump in, lots of things in the key takeaways are marked as potentially controversial. Because much of what Ryan covered hasn't really been stated by a lot of people. So until we have more consensus, of course things _could_ be controversial.Some key takeaways/thoughts from Ryan's point of view:Begin with the end in mind. It's easy to lose focus on what you are trying to accomplish instead of what steps you are taking. Focus on what the target outcome is and use that as a North Star to measure if you need to course correct.BI people need to brace themselves for a wave of innovation coming. There is so much - hopefully positive - change coming up the stack and BI people can embrace it or get washed over by the wave. Embrace and ride that wave and upskill!?Controversial?: Data mesh - and just about every other paradigm - does not focus enough on the last mile of analytics, at least not explicitly. So we need to get a lot more specific about what is necessary to actually take advantage of upstream improvements in data to deliver better analytics.!Important!: We need to get more specific on who does cross-domain BI in a decentralized world. Otherwise, we have interoperable data but no one specifically leveraging that interoperability for improving our understanding of the overall organization.BI as a practice needs to be much better at understanding and implementing iterative feedback. It's not been part of the playbook to date.?Controversial?: If domains develop BI capabilities, how will they interact with a central BI team if one exists? Historically, they haven't gotten along well...?Controversial?: Your central BI / Data warehouse team should get converted into an insight generation engine team.We need to figure out if BI is a primary role inside the domain or if it is a skill people inside the domain need to develop as part of their roles. Which is preferable is likely dependent on the size, complexity, and data value of the domain.BI can't be a data mesh afterthought - we'll get to it when we get to it isn't going to cut it. It might not be literally part of the data mesh implementation but it must be part of your overall data strategy. Otherwise, why are you doing data work if not to generate and take action on insights?!Controversial!: The semantic layer is likely to play an important role in data mesh. Scott note: depends what you mean by semantic layer. See Zhamak's Corner #12 for more on the dangers of layers.BI teams must become part of the insight generation and resulting action conversations. That means implementing product thinking instead of simply responding to requests. But most are not ready for that mindset shift just yet.?Controversial?: Domains should own an element of BI and not just share their data via data products but should generate and share their insights as part of data mesh.Product oriented thinking is not widely prevalent in the BI world - e.g. most BI teams do not understand the usefulness/usage of their BI assets or even have monitoring setup to track usage.!Controversial!: The BI analyst in the board room as merely a live question answer mechanism is a bad approach. It often leads to wrong answers with no meaningful quality control.BI people need to adopt product thinking and create meaningful relationships with users so they can evolve what they produce to continue to suit user needs. Blindly producing the same report for 4 years into the void needs to go the way of the dodo.!Important but Annoying!: We will probably have to meet some people where they are when it comes to sharing insights. Trying to upskill everyone to being an SQL whiz isn't realistic. Allowing people to export into Excel for further manipulation isn't going anywhere...Part of product thinking is designing to the user. That's pretty rare in data. What are they - the data consumer - specifically trying to do and how can the producing team support that? Instead of what does the producing team want to share, which is the common current method.Dashboards aren't dead, they are the best way to answer known questions in an oversight and management kind of way.?Controversial?: BI is possibly the most important part of your data culture simply because for the vast majority of your organization, they will interact with data mostly at the BI level. But too many organizations leave BI as an afterthought in data culture because it will be where the most people need the most help and thus is the most difficult aspect. People change is hard :)?Controversial?: In a successful data mesh implementation, there will be less work on the BI team because a lot more will be pushed down to the data product. The more the data product can share the insights inherently without an interpreter - the BI team - in the middle, the better.Ryan started with a bit of inspiration and/or a call to action to fellow business intelligence people: data mesh and a lot of other movements in data are creating a massive wave of innovation further up the data stack / data and analytics process. So BI people can either get with the wave and ride the changes to greater value or get washed over by it. Ride that wave :) Upskilling can be painful but it can mean a major upgrade in capabilities and value. And being highly tech-capable in BI is a great career move right now.According to Ryan, the last mile of analytics where BI really fits has been somewhat ignored in most modern data approaches and technical innovations. Tableau defined an era of BI and while Tableau isn't going away, that era is coming to a close. So he believes things like data mesh need to focus more on the last mile of analytics - or at least specifically call things out more often as to what changes and how in BI. Whether we call that self-serve or anything else. What does the upstream changes from data mesh mean for BI? How can we take advantage of these improvements in data processing, ownership, quality, etc. to deliver actual insights to those that matter?Ryan asked how should we actually do corporate BI in a data mesh type setup? Right now pre data mesh, most organizations have centralized BI teams. And while domains will probably need some BI capabilities, a recurrent theme in the podcast is that most exec questions are not single domain questions. So we build this huge amount of interoperable data, who is in charge of bringing that together at the corporate level? Who actually owns creating and maintaining C-suite and board-level dashboards and reports? Are we expecting execs to build those themselves? 🤣 And historically, central BI / Data warehouse teams haven't really loved embedded BI teams, often calling them "shadow IT". Can they play nicely together? If yes, how?There is often too much of a 'we'll cross that bridge when we come to it' relative to BI as part of most modern data strategies according to Ryan. While BI may not actually be part of the exact data mesh implementation at some organizations - there's all kinds of definitions and each org is extending the self-serve platform to encompass different aspects of analytics - BI definitely should be part of your data strategy. Otherwise, what is the point of doing data work if not to, you know, generate actionable insights? This is especially an issue when trying to embed insights and analytics in general into other applications.An important aspect of a data mesh implementation for Ryan is what is left to the BI tool and what is pushed down into the data product*. There isn't a lot of guidance or discussion happening exactly on what is a best practice or how people are implementing embedded insights, just that it's the future. There needs to be far more conversation about who owns data throughout the data and analytics process lifecycle. Is it all on domains?*Scott note: episode #40 w/ Xavier Gumara Rigol covers this wellFor Ryan, there are a few reasons why the general data warehouse has not been as successful as it could be. To actually interact with the data warehouse, people need a decent level of technical and BI skill. So when it comes to data mesh, we don't want there to be an even higher skill level required. So domains likely need to own some of their own BI, sharing insights from the domain in a consumable way to the rest of the organization. We can't only have domains sharing data, we need to be explicit about sharing information to a purpose. Is that to get the data, the insight, or the action that should be taken based on the insight? Scott note: there will be people in your organization that are only information consumers - taking info from dashboards and reports made for them. It's up to the organization to figure out if that is acceptable or to try to upskill those people against their will.While product oriented thinking is picking up in more areas of data, it's still not very common in the BI world according to Ryan. Most BI teams have no real idea of how useful any of what they produce is to the organization. Sometimes that is they don't have the systems set up to even monitor it but if they do, they typically don't understand how usage translates to value or honestly even look at the usage metrics. That means the BI teams lack the insights to make their BI assets as valuable as possible - what are the SLAs people need to make the most use of this data? How timely or accurate or complete does this need to be to drive the most value and be trustable? If BI people develop those relationships and that information flow, they can evolve what they produce to continue to meet needs. We need to stop producing work into the void; who is using this BI asset and why and how can we make it better and/or less costly to produce?A really important aspect of designing usable BI is thinking about the actual user in Ryan's view. It's often overlooked too and is part of product thinking: what is the product market fit for the BI asset / data product you are developing? How do people want to interact with it? How far along the pathway from raw data to insight to "so what" - the action that should be taken - is it the producer's responsibility to own and do the actual work? Does the consumer understand and agree? How can we generate things in data to create understanding and trust with the minimal amount of effort? Dashboards are great if you're monitoring a known question but we also need to get far better at answering how do we persuade someone with data.Ryan believes BI people need to move past the idea that the output is the end of the process. That dashboard or report or whatever is a communication mechanism about what is currently known. But it's not as if there is a single answer that stays the same. How many customers do we have is not a one-time answer question. There needs to be a lot of work to bring more iteration into BI, especially collaborating with consumers as both sides learn more about questions they are trying to answer. So BI teams need to move past being order takers and move to being part of the productization process around data and insights. Iterating in BI is not common practice but needs to be!BI is probably the most important aspect of your data culture according to Ryan. That's simply because it's the place where the vast majority of your organization will interact with your data, at the BI level. Whether that is in generating insights or merely consuming them, BI should not be left until the end of your data strategy. But it often is left to the end specifically because it is where there is the most human interaction and change management and that's the hardest part of change management, dealing with those pesky humans. Don't try to put it off until the end as it will just create more pain than is necessary.Ryan said "begin with the end in mind." It's quite easy in data work to get lost in the how instead of the why. The how is the fun/interesting technical challenges part. What are you trying to achieve? BI at the end of the day is not about generating insights, it's about supporting actions that help the business. So how do you build your BI strategy into your data mesh implementation strategy so you are not just making data available and trustable? How can you move past just generating insights? How do you think about the end - making smarter business decisions driven by trustable data - and work backwards? Don't lose sight of what you are trying to achieve.Ryan wrapped up the conversation by returning to the crucial points of integrating in product thinking to BI and upskilling. We can go further than we ever have before with data and BI but we have to embrace new ways of working, both on the strategy/data culture, and the technology. But just trying to advance your BI practice with technology is going to miss a whole lot of the value.Quick tidbits:We have to figure out in the domain if BI is a primary role or is it part of the necessary skillsets involved but not a separate role. It will probably be domain dependent, not even organization dependent, based on the complexity, size, and data value of the domain.Right now, most organizations want to embed insights into their general flow applications - e.g. the CRM for the sales team - but they aren't doing the back-end work to make that possible.Putting a business analyst in the boardroom simply as a vehicle to live answer questions usually causes more issues than it solves. It leads to a high pressure environment where you can't be sure of quality. Scott note: fully agree. There's a long rant I have about this…Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Jan 18, 2023 • 15min

#182 Zhamak's Corner 15 - A Post Pipeline World? How to Build Data Products - and a Product Creation Ecosystem - That Inherently Create Trust

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Sponsored by NextData, Zhamak's company that is helping ease data product creation.For more great content from Zhamak, check out her book on data mesh, a book she collaborated on, her LinkedIn, and her Twitter.We wrap up another recording of Zhamak's corner talking about how do we actually start to look to build data products in a post pipeline data world. Data tools right now are kind of duct taped to each other and duct taped to the pipeline - how do we rethink starting from the end product - that mesh data product - and hook the tools to that to make interacting with it better. If you build a system that truly focuses on intentionality and responsibility that people can see, it creates trust. Away with the data black box!Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereData Mesh Radio episode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app