
Data Mesh Radio
Interviews with data mesh practitioners, deep dives/how-tos, anti-patterns, panels, chats (not debates) with skeptics, "mesh musings", and so much more. Host Scott Hirleman (founder of the Data Mesh Learning Community) shares his learnings - and those of the broader data community - from over a year of deep diving into data mesh.
Each episode contains a BLUF - bottom line, up front - so you can quickly absorb a few key takeaways and also decide if an episode will be useful to you - nothing worse than listening for 20+ minutes before figuring out if a podcast episode is going to be interesting and/or incremental ;) Hoping to provide quality transcripts in the future - if you want to help, please reach out!
Data Mesh Radio is also looking for guests to share their experience with data mesh! Even if that experience is 'I am confused, let's chat about' some specific topic. Yes, that could be you! You can check out our guest and feedback FAQ, including how to submit your name to be a guest and how to submit feedback - including anonymously if you want - here: https://docs.google.com/document/d/1dDdb1mEhmcYqx3xYAvPuM1FZMuGiCszyY9x8X250KuQ/edit?usp=sharing
Data Mesh Radio is committed to diversity and inclusion. This includes in our guests and guest hosts. If you are part of a minoritized group, please see this as an open invitation to being a guest, so please hit the link above.
If you are looking for additional useful information on data mesh, we recommend the community resources from Data Mesh Learning. All are vendor independent. https://datameshlearning.com/community/
You should also follow Zhamak Dehghani (founder of the data mesh concept); she posts a lot of great things on LinkedIn and has a wonderful data mesh book through O'Reilly. Plus, she's just a nice person: https://www.linkedin.com/in/zhamak-dehghani/detail/recent-activity/shares/
Data Mesh Radio is provided as a free community resource by DataStax. If you need a database that is easy to scale - read: serverless - but also easy to develop for - many APIs including gRPC, REST, JSON, GraphQL, etc. all of which are OSS under the Stargate project - check out DataStax's AstraDB service :) Built on Apache Cassandra, AstraDB is very performant and oh yeah, is also multi-region/multi-cloud so you can focus on scaling your company, not your database. There's a free forever tier for poking around/home projects and you can also use code DAAP500 for a $500 free credit (apply under payment options): https://www.datastax.com/products/datastax-astra?utm_source=DataMeshRadio
Latest episodes

May 20, 2022 • 1h 21min
#78 Minimum Viable Data Mesh? - Interview w/ Paul Andrew
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Paul's data mesh blog series: https://mrpaulandrew.com/tag/data-mesh-vs-azure/Paul's LinkedIn: https://www.linkedin.com/in/mrpaulandrew/Paul's Twitter: @mrpaulandrew / https://twitter.com/mrpaulandrewIn this episode, Scott interviewed Paul Andrew, Technical Architect at Avanade and Microsoft Data Platform MVP.Paul started by sharing his views on the chicken and egg problem of how much do you build out your data platform and when to support your data product creation and on-going operations. Is it after you've built a few data products? Entirely before? And how that discussion becomes even more in a brownfield deployment that already has existing requirements, expectations, and templates.For Paul, delivering a single data mesh data product on its own is not all that valuable - if you are going to go to the expense of implementing data mesh, you need to be able to satisfy use cases that cross domains. And the greater value is in cross-domain interoperability, getting to a data product that wasn't possible before. And, you need to deliver the data platform alongside those first 2-3 data products, otherwise you create a very hard to support data asset, not really a data product.When thinking about minimum viable data mesh, Paul views an approach leveraging DevOps and generally CI/CD - or Continuous Integration/Continuous Deliver - as very crucial. You need repeatability/reproducibility to really call something a data product. In a brownfield deployment, Paul sees leveraging existing templates for security and infrastructure as code as the best path forward - supplement what you've already built to make it usable for your new approach. You've already built out your security and compliance model, make it into infrastructure as code to really reduce friction for new data products. For Paul, being disciplined early in your data mesh journey is key. A proof of concept for data mesh is often only focused on the data set or table itself, not actually generating a data product and much less a minimum viable data mesh. It's pretty easy to put yourself in a very bad spot because taking that from proof of concept to actual production is going to be a very hard transition and telling users it will take weeks to months to productionalize is probably not going to go well. Be disciplined to go far enough to test out a minimum viable data mesh.Paul emphasized the need for pragmatism in most aspects when implementing a data mesh. Really think about when to take on tech debt and do so with intention. When shouldn't we take on tech debt? And how do we pay down tech debt and when? There is a balance between getting it done and technical purity. How do we choose what features to sacrifice? What is the time-value to money aspect, or how much importance do we have on getting it done sooner rather than more completely? These are questions you'll need to ask repeatedly.Similar to what previous guests mentioned, Paul is working to encourage and facilitate the data product marketing and discovery process - discussing with data consumers what they want, pie in the sky thinking. Then taking that and speaking with data producers and figuring out pragmatic approaches and what is simple to deliver. Is one aspect going to be very difficult? Go to the consumers and let them know it will delay delivery and they need to fund that aspect. Do they still want it? Use that back-and-forth discussion to drive negotiations to a valuable solution with less effort. Look for that return on investment. Be pragmatic!Paul recommends making business value your general data mesh 'North Star'. Ask the pragmatic questions - so shift the data function from taking requests/requirements to leading those negotiations. Have the conversation of "Is this worth it? Who is going to pay for it? What is it worth to them?" As of now, Paul and team are still often functioning as the translator between data producers and data consumers.But, when discussing the goal of getting out of the middleman/translator role, Paul pointed to a few signs that an organization is ready for producers and consumers to directly work with each other. Some aspects are general company culture, how data literate/capable are the execs, data platform maturity, etc. If you can mature your organization's approach and skill, you can move towards not needing a data translator.Paul talked about how to think about your data mesh journey and different elements of it, even a data product, in a crawl, walk, run fashion. Think about your data products first and foremost as serving at least one specific purpose. Still create with reuse in mind but they should have a use-case to serve and can expand from there. At a mesh level, part of crawling is getting to a few standard interfaces for data products to use to communicate. At the data platform level, part of crawling is getting to a place where it is possible to publish new data sets but walking might be a significant reduction in friction to data product production. While this means that a minimum viable mesh is still a pretty high bar, you can get to a place that is comfortable with being at a crawling stage with a good roadmap towards walking and running. Done and good is better than perfect and the forever "just 3 more weeks".Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 17, 2022 • 17min
#77 Why DO Data Warehouse Fans Fear Data Mesh So Much? - Mesh Musings 15
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 16, 2022 • 1h 11min
#76 A Skeptic's View of Data Mesh and Learning Your Data Product ABCs - Interview w/ Tim Gasper
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Tim's Twitter: @TimGasper / https://twitter.com/TimGasperCatalog & Cocktails page: https://data.world/podcasts/Data.world blog content:Do You Know Your Data Product ABCs? https://data.world/blog/data-product-abcs/The Role of a Data Catalog in Data Mesh https://data.world/blog/data-catalog-data-mesh/In this episode, Scott interviewed Tim Gasper, VP of Product at data.world and the co-host of the Catalog & Cocktails podcast. They covered two main topics - 1) the skeptic's view of data mesh and 2) Tim's/the data.world team's "ABCs of Data Products" framework.Skeptics have a few main pushbacks on data mesh in Tim's view. Tim listed the top 6 that he sees and then discussed them with Scott.#1: Data mesh isn't for every organization depending on size, number of domains, data/problem space complexity, etc. Tim said this. Zhamak has said this. Most data mesh advocates/fans say this regularly. This is one of the myths of data mesh - that it's designed for everyone. Don't go to a decentralized data setup if you don't need to. Tim made the very good point that we need more conversations and better guidance on what to measure if centralization of your data team and processes is your actual challenge.#2: Tooling doesn't exist - yet? - to make it easy for domains to easily take over data ownership. A big conceptual myth of data mesh is that it has to solve every data problem, even the most difficult, right out of the gate. Tim mentioned that your team needs to really think about self-service being about empowerment, not necessarily a single big red easy button. And your implementation will evolve - it MUST evolve. It's not easy yet and if your team isn't prepared to roll up their sleeves, it's okay to wait to implement.#3: There shouldn't be anyone who "owns" the data. Tim made a really good point here on accountability to sharing your data versus the "fiefdom" model - where someone has complete control over how the data is used. Yes, someone shouldn't be able to prevent other domains from using data. But that's not at all in the spirit of data mesh anyway. Why would you make data reusable and discoverable if people can't use it?#4: There aren't enough case studies yet. Tim mentioned this briefly. It is a bit of a chicken and egg issue: if we wait for people to be "done" with their journeys, it will be another 5 years before good case studies emerge. It's okay to need more proof before wanting to go forward but it might mean lost opportunity. And there are good examples out there, including guests from this podcast (20+ so far).#5: Lacking guidance on exactly how to handle cross domain data combinations. Tim mentioned that there is the question of how do those combinations get managed as right now, in a data warehouse or data lake world, there are clear owners - the data team. Unfortunately for those who want a direct data mesh playbook, this is situational and you have to figure it out yourself for each situation and be ready to evolve. #6: Data mesh will create data silos. Sure, if you have the data mart model of old where data is created only for the domains to use internally. But that's not data mesh. Tim talked about how important iteration and collaboration is to prevent data silos. So much is about the intent to not let data silos become a problem and iterate towards interoperability.Overall, Tim and Scott agreed that a lot of the pushbacks are probably coming from orgs where data mesh would create a lot of friction in their existing cultures and as Tim said, changing culture is very hard and "fixing" culture is even harder.Tim talked about how we too often think about data implementations, whether macro or micro, as a singular event, something that doesn't evolve - data implementations aren't a house, more like a garden. Seasons change, you might have to weed a bit - or a LOT -, you might change what the focus of your garden is - are you sick of zucchini? Is this data product or report/dashboard no longer relevant?The Data Product ABCs framework: Tim and the team at data.world put together a framework for thinking about data products. An important aspect is that this, like much of data mesh, isn't about providing specific answers but more the questions you must answer to get to a good outcome. A key point Tim made at the end was just how many data challenges come from implicit expectations and knowledge versus getting very explicit to make sure everyone is on the same page and that knowledge is shared and documented. Tim basically said get in the room -> negotiate -> come to a conclusion and shake hands -> document.A is for Accountability - Who owns the data product? And what does ownership specifically mean?B is for Boundaries - What is a data product? What interfaces does it use? And crucially what isn't a data product? And also what isn't part of a specific data product? C is for Contracts - What are the explicit expectations of this data product? Who can use it? What are the SLAs? Abe Gong mentioned in his episode, #65, how often these contracts at least start as implicit - let's get communicating and negotiating folks!D is for Downstream - Who uses the data product, who might want to use it, and why? What is the roadmap? Etc.E is for Explicit Knowledge - Because ABCDK doesn't sound as good? Don't believe your data products are self-describing. Document things, explain in detail. What are the relationships to other data products or concepts outside the data product?Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 15, 2022 • 26min
Weekly Episode Summaries and Programming Notes - Week of May 15, 2022 - Data Mesh Radio
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 13, 2022 • 1h 22min
#75 Let's Get Intentional With Data: DDD for Data, Hyper Objects and More - Interview w/ João Rosa
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.João's Twitter: @joaoasrosa / https://twitter.com/joaoasrosaJoão's personal space: https://www.joarosa.ioKent Beck talk at DDD Europe 2020: https://www.youtube.com/watch?v=3gib0hKYjB0Timothy Burton book on Hyper Objects: https://www.upress.umn.edu/book-division/books/hyperobjectsIn this episode, Scott interviewed João Rosa, Principal Consultant at Xebia. They discussed domain driven design for data, the importance of intentionality in preventing chaos, being effective instead of efficient, and the concept of a hyper object.To start at the end, João talked about the need to embrace complexity when dealing with software - and we need to treat data and analytics as a software process. If we try to abstract away the complexity, we lose the nuance and that nuance is what can make all the difference in terms of the value of your data. Software is not like manufacturing where complexity is very costly.This was a pretty broad-ranging conversation starting with Domain Driven Design - or DDD - for data. João believes we should apply the principles of DDD to everything controlled by software - and when thinking of data as a product, data is definitely controlled by software. One of the big challenges with bringing something like DDD to data is that there aren't tools - and most challenges in the data space have historically been addressed with a tool-first approach. There is a desire to move quickly and just solve challenges but it's not possible to do that with DDD in João's view. A very interesting point of view João has is developing software is a learning process and working software is a consequence of that learning. With the move to cloud and the easy consumption of new tools, creating data is very easy. But João believes that in an enterprise, there needs to be very clear boundaries and contracts between domains to prevent overlap and confusion. The conversations between teams are hard because all of them are context-dependent. Even at the software level, your interface to your data products is a form of communication. João brought up the manufacturing-oriented philosophy of software development and why it causes so many challenges. It is very much about efficiency and lean development. That works well when you are producing physical goods but he doesn't think it does for software. Small incremental changes to software are not costly in a CI/CD world but the creation of software is expensive. So we need to move away from the manufacturing approach. But that would mean management releasing more control, which many are not willing to do. For João, there is also a major value to discovery about what you've already deployed. How are people using it, what is the market / consumer-base telling us? But in general, we spend far too much time focused on new features and not discovering new things about what is already in production. And those small incremental improvements are often the things that generate real value - and if the investment is small to generate good returns, those small changes are a significant point of potential value leverage. João brought up Kent Beck who said "once software arrives to production, it changes itself". Measuring that feedback is crucial. Data mesh, if done well, can really set up organizations to succeed because it can make people effective rather than efficient - we create data products that are easy to use but have unexpected consumption. People can discover new things. We lower the friction to new, useful insights. Efficiency is doing the task at hand with little waste. But is that effective in creating business value?Intentionality is a key theme for João - if you have autonomy without direction, it can create chaos. In her episode, Jessitron (Jessica Kerr) mentioned the need for agency instead of autonomy. Autonomy is "you figure it out" - João quoted Jessitron as saying "you provide me the direction but not the path". We should also be constantly assessing what are we trying to accomplish and are we actually headed in that direction. What is the business problem you are trying to solve? Apply intentionality to your work to stay focused on the real goals.When ~80% of our time is spent trying to code and only ~20% is spent on setting our intentions, what is the outcome? João believes if we flipped that and focused much more on what we are trying to achieve, solidifying the communication before going and coding it, that would have a far better outcome.Right now, João believes that data is where DevOps was about five years ago - we still, as an industry, need to build the body of knowledge on how to do this right. The DevOps engineer title is starting to fade and we are calling them what they are - platform engineers. But as with DevOps, we need to look at the long-term payoff of building a platform - not all organizations should build a platform!João brought up the need to think about the long-term viability of all data initiatives, not just the platform. Data products must be sustainable - which is why so many guests have recommended starting with source-aligned data products. Lorenzo Nicora does a great job explaining why in his episode. One of João's clients is leasing large industrial equipment and has switched to proactive maintenance instead of waiting for things to break and fixing them then. This has created a more reliable service for customers and lowered maintenance-related downtime costs. How can we apply that to data?A hyper object is an object that spans time and space. João sees data as a hyper object but we typically think of data as a snapshot in time. How do we store data today to answer the questions of tomorrow? And how do we apply intentionality to data so we stop storing data for the sake of storing it. This philosophy better enables us to think of data as a product and reason about the evolution of a data product. To attempt to sum up João's thoughts: focus more on intentionality - why are we doing something and is it working -, embrace complexity, and look to solve more through conversation instead of tooling.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 11, 2022 • 14min
#74 What is Data Mesh Trying to Achieve? - Mesh Musings 14
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Scott shares some thoughts on some recent data mesh FUD and market confusion.Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 10, 2022 • 1h 2min
#73 Ship-Posting and Cake Recipes: Measuring the Return of Your Data Initiatives - Interview w/ Katie Bauer
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereKatie's LinkedIn: https://www.linkedin.com/in/mkatiebauer/Katie's Twitter: https://twitter.com/imightbemaryIn this episode, Scott interviewed Katie Bauer, a Data Science Manager at Twitter in their Core-Tech group. To be clear she was not on representing Twitter, only her own opinions. The main topic of discussion was how to measure the value and success of your data projects/implementations.Some very useful advice from Katie that can feel a bit obvious when said but is VERY often and easily overlooked: measure for what would make you drive actions. If getting a 10x higher than expected or 90% below expected result isn't going to change your decision, while it may be interesting information, is it really important? If not, don't waste the time to measure it. Especially early on in your data measurement maturity. The point is also to get to an objective evaluation, not overly precise measurements. Set yourself up to improve and iterate. Don't make this hard on yourself.She also gave the pithy statement: what is valuable is not necessarily valued. Katie has a cake analogy that plays into data maturity well. Think about your need and the other person's capability regarding making a cake. Do you need a fancy cake for wedding or is this for a 3 year old's birthday party? One, you probably want to be special. One, if it vaguely resembles something from TV and tastes decent, the consumer will probably be happy. Is the other person capable of making a super fancy layered red velvet cheesecake or is a cake mix in a box probably more up their alley. How mature are the parties on creating measurement data and how mature or advanced do you need the output to be?Katie started the conversation talking about some survivorship bias / other biased ways of measuring. Often, she has seen throughout her career that people having success seek to prove their success via metrics instead of find the metrics that matter the most. That has some pretty obvious flaws so we need to move forward towards better measurement practices. For Katie, measuring the value of data science is pretty meta.Katie recommends starting out with some really easy measurements around engagement and usage. If it's a platform, what are your daily active users, weekly active users, and/or monthly active users - and what is the actual most useful metric? Should people actually be leveraging your project daily? Think about what is your addressable market and what percent of that market you have. And NPS (net promoter score) is a very lagging indicator.When thinking about metrics, there are two things that really stand out to Katie: first, what is your useful granularity? Don't get overly precise if you don't need to. You want an objective evaluation and anything past that can become overkill, which has an inherent cost. And second, what is your useful time-scale? Is this on a micro-scale, where the task should take 5min to complete so a difference of 5min is a big deal? Or is it a much longer time scale?When thinking about what to measure, ask yourself what does your company value. Is it shipping, usage, cleaning up tech debt/deprecation, etc.? Katie threw out a bit of a mind bender: what is valuable is not necessarily valued. So think about what people care about regarding information flow. It might not be the most valuable information, but it might be highly valued. Or vice versa. At the end of the day, are you there to be right or to serve your constituents?Katie's getting started on measurement advice includes starting with something concrete - use that initial measurement as a learning stepping stone. She mentioned that it can be hard to recover from measuring the wrong thing or getting your measurement wrong - people can jump to measuring is bad so set yourself up via expectation setting that you will iterate on your metrics.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 9, 2022 • 1h 23min
#72 Reliability in Data Mesh: Why SLAs and SLOs are Crucial - Interview w/ Emily Gorcenski
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.Emily's LinkedIn: https://www.linkedin.com/in/emily-gorcenski-0a3830200/Emily's Twitter: @EmilyGorcenski / https://twitter.com/EmilyGorcenskiEmily's Polywork profile: https://www.polywork.com/emilygorcenskiEmily's website: https://www.emilygorcenski.com/Alex Hidalgo's Implementing Service Level Objectives book as mentioned: https://www.alex-hidalgo.com/the-slo-bookIn this episode, Scott interviewed Emily Gorcenski, Head of Data and AI at Thoughtworks Germany. Emily has put out some great content relative to data mesh.As a data scientist by training, Emily has a data consumer bent in her views on data mesh. She is therefore often focused on how can data mesh help "me" (her) as a data consumer.SLAs and SLOs come right out of the site reliability engineering playbook from Google. Overall, systems reliability engineering practices are crucial - Emily asked why don't we bring the rigor of other engineering disciplines to software engineering?So, what is an SLA and an SLO? Per Emily, an SLA is a contract between two parties - hence why agreement is in the name. This agreement should be written around an SLO with the SLO serving as a specific target. That can be uptime or latency in the microservices realm but with data, SLOs can get a little - or a lot - more tricky.The theory around developing an SLO is for it to directly connect to business value. Emily believes that when we think about SLOs and data, we shouldn't apply SLOs directly to the data but should shift those SLOs to the left and have SLOs in the software engineering practice that apply to data.Emily mentioned another antipattern for SLAs in general, which is not connecting them to SLOs. But when it comes to data, most teams don't even have any SLAs, connected to an SLO or not. As an industry, software engineering has figured out how to offer great SLAs to external parties but many organizations still struggle to offer good SLAs internally.For Emily, software-focused SLAs can even result in worse outcomes for data. If an SLA is about uptime, it might result in pushing bad data into a system so a service can maintain its SLA.When developing SLAs, Emily recommends starting with conversations and negotiations between both parties. If 5 9s of uptime is not valuable to your consumers, why build to ensure 5 9s? Dig into actual user needs and what will actually drive user value. And start to differentiate between infrastructure focused SLAs - like is the data product available - and data SLAs - like is the data updated and does it meet quality thresholds.Emily then started to talk about some of the fun very specific SLAs around data and what does data availability mean. These SLAs can get complicated but they can start to really drive towards what is actually valued by the consumers, what is the actual value of the data so you can then start to negotiate to drive a high return on investment. Again, we can avoid pre-optimizing for facets that consumers don't care about.Per Emily, good SLOs will tell you what you should improve. We should make sure our SLOs are decomposable to again, get quite specific when useful and/or necessary. It is much more difficult to do in data than in general software engineering - we can't think about data in a binary way such as accurate or not it is much more of a continuous spectrum. Emily recommends to look at the error budget concept and think about how we can apply that to data. Emily believes SLOs can help you to avoid building unnecessary complexity - if your users don't need real time results, don't build a real-time system. It's the conversations and negotiations that take you from the state of what's possible to what's valuable. We should use SLOs to align closely to the use case - there is definitely such a thing as good enough. And don't create Franken-Data-Products - monstrosities that try to solve every single need. It's fine to have two similar data products to serve two distinct needs.For Emily, data consumers keep complaining to a centralized data engineering team. People on that centralized team are the unfortunate middle-people with little power to change what consumers are getting. We should use SLOs and move them to the responsibility on the software development teams - the domains - much like we do data ownership in data mesh. Once an organization learns to do SLOs well, Emily recommends extending that to use SLOs around the data platform. But to not mistake the SLOs and SLAs around infrastructure and data products as mentioned earlier.Emily believes the governance team also has a responsibility to drive standardization around SLOs. This includes sensible defaults.What should we learn in the data space from DevOps? For Emily, the philosophy of resilience is crucial. Repeatability and safety through continuous integration / continuous delivery - or CI/CD - is a major driver of value in software engineering. How can we apply it to data?In data, we all too often use a systems oriented approach so we don't properly attribute value well per Emily. How can we measure the value of being able to do ad hoc analysis? Not the value of the analysis itself but almost the inverse of opportunity cost - what is the opportunity value? If we remove the abstractions, can we get to a specific value measurement?Emily believes we need to get much more serious about creating good data about our data practices. It takes a fair bit of effort to get to a place where we can repeatedly get good, usable data on our data initiatives at scale. We also need to give people more slack in their work time to chase down additional information. Serendipity can only strike if people have the room to create it and then react to it. Emily wrapped up her thoughts on a few points - first, the pace of change of business has accelerated significantly and it requires us to philosophically reorient how we think about data. There needs to be more space for people to doing the new necessary work to drive the high incremental value. But because everyone is so overloaded already, that isn't happening in most organizations. And second, start from the consumers and their needs and work backwards. It's okay to not create every piece of potentially useful data in a usable fashion upfront. Figure out what are the needs you know about and build towards those - additional use cases will emerge.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 8, 2022 • 32min
Weekly Episode Summaries and Programming Notes - Week of May 8, 2022 - Data Mesh Radio
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

May 6, 2022 • 1h 2min
#71 Adventures in Data Maturity - Creating Reliable, Scalable Data Processes - Interview w/ Ramdas Narayanan
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereRamdas' LinkedIn: https://www.linkedin.com/in/ramdasnarayanan/In this episode, Scott interviewed Ramdas Narayanan, Vice President Product Manager of Data Analytics and Insights at Bank of America. To be clear, he was not representing the company and was sharing his own views.Ramdas came on to discuss lessons learned from building effective data sharing at scale on the operational plane over the last 5-10 years so we can apply those to our data mesh implementations. A key output of the conversation is a guiding principle for getting data mesh right - your goal is to convert data into effective business outcomes. It doesn't matter how cool or not cool your platform is or anything else - drive business outcomes! It's easy to let that get lost in the tool talk and everything around data mesh.Per Ramdas, when looking at creating a data product, or really any data initiative, you need to align first on business objectives and that will drive funding. In the financial space, that is direct literal funding but even outside, you should have the same mindset. Make sure you get engagement and alignment across business partners, technologists, and subject matter experts. How are you using technology to address or solve the business problem?Ramdas has seen that if you don't focus on creating reusable data, you can create silos - you need cohesive data sets, not bespoke data sets for every challenge as that just doesn't scale. You should also study the data sources you are using - is there additional useful data you could add to your dataset or could you use that data for other purposes - keeping an eye out for additional data to drive business value will really add a lot to your organization.When working with developers, Ramdas recommends helping them understand how the business is going to consume and use the data and then figure out if they should deliver data as something like an API or web service or more of a custom batch delivery. It is important to also work with data consumption teams to be reasonable in their consumption demands - getting them to modernize can be a challenge and that can put an unreasonable burden on producing teams.Ramdas talked about how crucial conversations and culture are to getting data projects/products right. Sometimes the conversations can be tough but often they really aren't and there just needs to be open exchange of context and information, especially aligning on business objectives. Projects that fail typically have poorly defined business objectives or lack alignment.Per Ramdas, it is important to educate the business people on what data exists and even what data doesn't. That clouded vision of what data is available creates a lot of frustration - we need to get better in general at data discoverability so the business folks can know what is available and get access easily. Ramdas has seen repeatedly that good context via rich metadata also leads to better context sharing at the person-to-person level as it generates additional conversations.To emphasize that point a bit more, Ramdas believes that data discovery is the main spark for sharing context. Otherwise, we are at best exchanging data as 1s and 0s instead of the actual information. Ramdas believes everyone needs to understand how information flows through your systems - it can help you better understand the art of the possible and also identify gaps in how you will approach your challenges. Start your projects, whether that is a new data product, a new platform feature, or anything else, by having a lot of information architecture meetings. After that, start to focus on data discoverability. "Show and tell" sessions have worked well for him as they spark new thoughts and can help surface issues.Ramdas wrapped on a really crucial part of data maturity which is the curiosity factor. Always be asking why you are doing something, what problem are we actually trying to solve? Do we have the capabilities to solve them? How does the data flow through our systems? Can we push data quality upstream to prevent quality issues instead of remediate them? What guardrails can we put in place to prevent issues? How can we enrich our metadata to make this data even more valuable? Etc.Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf