AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Selecting the Initial Team
The speaker discusses their approach to selecting the initial team for their data mesh journey, emphasizing the importance of finding capable partners and individuals with a deeper understanding of data. They also highlight the benefits of empowering domains and gradually transitioning towards true data ownership.
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Paul's LinkedIn: https://www.linkedin.com/in/paul-cavacas-32a36158/
In this episode, Scott interviewed Paul Cavacas, Senior Manager of Data and Analytics at Ocean Spray.
Quick note before jumping in: Ocean Spray is just at the beginning of their journey - in their pre-implementation phase - and there hasn't been a lot of resistance yet internally. That might make a few people jealous 😅 but there's a lot of interesting things Paul is doing to ensure that they are ready to decentralize what makes sense to decentralize at the right time. There is a lot to be gained from not rushing in. Also, apologies that Scott's audio is a bit weird, he had yet to build his makeshift sound studio in the Netherlands.
Some key takeaways/thoughts from Paul's point of view:
Paul started off with a bit about why they are headed down the data mesh path. For a large internal project, Paul had to become an expert on so many aspects of the company and that's just not scalable in the long-term or if he's on vacation. So, he's started to decentralize the data capabilities - slowly - as teams understand what data they will likely need to own in the long-run. And he's not playing data ownership 'hot potato', he's making sure they are prepared in the right ways.
At Ocean Spray, Paul shared that until recently everything tech - including data - was very centrally owned. In some areas, the IT team knew possibly more about the business processes than even the business people in those domains. So the company is going through all of their software and applications to decide how that should look in the future. Data mesh plays well into that rethink because central ownership scales until it doesn't and limits flexibility.
As there isn't a rushed timeline, Paul has been able to put together a complete idea of what the data mesh roadmap will look like. But he also understands that it could look completely different as he learns more and starts trying to actually implement different aspects. There are some existing data sets/assets out there that could pretty easily become data products in the right environment so that is where they are targeting first. They are working with the teams to transfer some part of the ownership, especially around documentation of use cases and SLOs. The central team is pairing to take existing data assets, decompose those into their data products and help people get on the path to real ownership.
Paul recommends what Brian McMillan talked about in depth in episode #26: finding people within domains that are at least somewhat tech savvy and want to advance their careers. Work with them to get them more and more up to speed. Ownership is not something that gets transferred in a day - treat it with more respect than that. So that's finding receptive people inside the receptive domains. Yes, it won't always be easy but why make the buy-in complicated at the start if you don't need to?
Right now, Paul is building out some of the technical underpinnings of the platform they plan to build. If there are teams that want to move more quickly, they can start to test things out now. As long as those teams understand things aren't fully automated and they may have to change things about what they build now when the company starts to fully move to data products. One big piece he is anticipating is the need for testing and data contract mechanisms. But exactly how to do that is still a challenge and will be learned along the way. He's anticipating a workable but not perfect solution to start. Build to useful and then improve.
Paul circled back on the idea of finding the right partners over the right use cases/domains. Having engaged and excited partners, who know you can up their own data capabilities and drive value for them too, will make your early journey far easier than going for the most "valuable" data. You are also likely to get better feedback because they are bought in to collaborating with you! To find those partners, potentially look at how teams present their results internally. If they are presenting with lots of advanced figures and almost a flair around data, that is great sign.
How much data ownership/work gets decentralized and when is a key remaining question for Paul. He's aware that he'll have to test what works and iterate as he learns but there are plenty of domains that are too small to justify them learning a ton about how to own data when there just isn't that much data/data work to deal with. There will be a shared ownership model between the central team and the domains. Scott note: this works up to a certain scale and in certain types of organizations. Shared ownership in a very large organization rarely works that well for all that long - too much political infighting and challenges but it's an interesting pattern for smaller orgs that seems to be working well.
Paul's plan for assessing the quality of data products is to create a rubric scoring system - asking people to rate them across multiple dimensions like usability, data quality, SLA compliance, etc. And that the scores or how they are measured may change across time. At the start of a data product's life, when it's still in beta, those scores can be invaluable to iterate towards value but then consider throwing the historical scores out once it hits that v1.0. That's because there is a useful aspect of feedback depending on what you are trying to achieve and bad historical scores could hinder the success of a data product when it is now very high quality and valuable.
For Ocean Spray, their first few data products are going to be source aligned, combining a lot of important sales information. That way, those people who want raw data can still get at it but then they can build out more and more views/data products for the users on top of those. That way, there is still the scalable/productized underlying production of the raw data and then more fit-for-purpose outputs for the different users.
Paul is not letting perfect get in the way of progress. Data contracts have to get to a place where we aren't locked onto schemas as something that can never change. But no one has really come out with a better solution yet, so that's what he's doing to start. It's better than nothing so go with it while you figure out better ways.
Paul finished with a bit of advice around working with a few domains at the start of your journey. That way, you can take the learnings and understand the needs from multiple domains to abstract to a better solution for the organization rather than one overly tied to one domain's needs. Scott note: people seem pretty 50/50 split on working with one domain or 2-3 at the start of your journey. It's an interesting question.
Other unique factors of Ocean Spray:
The corporate structure is a co-op of growers so there isn't some massive pressure to grow at all costs.
Domains have been able to get access to other domains' data relatively easily for a long, long time. It hasn't been cleaned and prepared for them but there is an existing culture of sharing.
They are moving more and more to 3rd party applications rather than custom-built, which means data isn’t necessarily in an easy to consume format by default. (maybe not all that unique?)
Because many domains are quite small, the central team will likely still own most if not all of the data work for those domains.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode