AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
One of the main ideas discussed in this podcast episode is the importance of building value early in a data mesh implementation. The speaker emphasizes that simply creating a data mesh does not automatically generate value, and that it is crucial to develop a clear thesis around what will drive value early on. This involves considering factors such as data discovery, data consumer trust, and the specific outcomes that the implementation aims to achieve. By focusing on building a high-quality data quantum, which is a component of the data product, organizations can provide faster and easier data discovery, ensure trust in the data, and deliver valuable insights to data consumers.
Another key point discussed in the podcast is the significance of standardized APIs and metadata in the context of data mesh. The guest speaker highlights the importance of having control, observability, and discovery APIs that are standardized across data quanta or data products. This allows for easier adoption of new data quanta and simplifies the data consumer experience. The standardized APIs provide a common interface for accessing metadata and other relevant information, ensuring that data consumers have a clear understanding of the data and can trust its quality and usability. By adopting an API-first approach and designing APIs that serve a purpose, organizations can enhance their data mesh implementation.
The podcast episode explores the concept of combining operational and analytical workloads within the data mesh. The speaker suggests that it is feasible to have a data store that supports both types of workloads, emphasizing that data engineering and software engineering should not be artificially separated. While acknowledging the distinction between operational and analytical needs, there is an emphasis on considering how both types of workloads can coexist effectively within the data mesh. The episode encourages organizations to think beyond traditional boundaries and leverage the capabilities of the data mesh to support different operational and analytical use cases. This approach requires careful planning and coordination, but can result in more holistic and efficient data management.
One of the main ideas discussed in the podcast is the importance of effective communication and establishing clear data contracts. The speaker emphasizes the need for higher-level communication to understand the goals and achievements of data projects. They highlight the significance of data contracts in providing a structure and framework for data consumption. By using APIs to make data contracts more accessible, the speaker suggests that it becomes easier for consumers to understand and utilize the data. The discussion also touches on the idea of allowing consumers to request modifications to data contracts, which can lead to customized contractual obligations and increased trust between data producers and consumers.
Another key point discussed in the podcast is the concept of data quantums and their role in building trust and value. Data quantums are defined as strict one-to-one mappings between data and domains. Each quantum represents a specific domain and can contain multiple data sets and versions. The speaker mentions that data quantums are designed to deliver value in themselves and can be enhanced over time. By adhering to data contracts and quality standards, data producers can offer reliable and trustworthy data to consumers. The discussion touches on the potential value of meshing different data quantums together and the importance of community contributions and open sharing in the context of data mesh.
Due to health-related issues, we are on a temporary hiatus for new episodes. Please enjoy this rerelease of episode #130 with my partner in our weekly data mesh roundtables Jean-Georges Perrin. There are a lot of interesting things to take away from this. A biggie is to have an early thesis about what to drive towards - what will drive value early? Doing data mesh doesn't simply create value. And you need to build momentum. There's a lot here to learn about how to apply good software engineering practices to data with data mesh.
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Data Mesh at PayPal blog post: https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522
JGP's All Things Open talk (free virtual registration): https://2022.allthingsopen.org/sessions/building-a-data-mesh-with-open-source-technologies/
JGP's LinkedIn: https://www.linkedin.com/in/jgperrin/
JGP's Twitter: @jgperrin / https://twitter.com/jgperrin
JGP's YouTube: https://www.youtube.com/c/JeanGeorgesPerrin
JGP's Website: https://jgp.ai/
In this episode, Scott interviewed Jean-Georges Perrin AKA JGP, Intelligence Platform Lead at PayPal. JGP is probably the first guest to lean into using "data quantum" instead of "data product". JGP did want to emphasize that as of now, he was only discussing the implementation for his team the GCSC IA (Global Credit Risk, Seller Risk, Collections Intelligence Automation) within PayPal.
Some key takeaways/thoughts from JGP's point of view:
JGP started the conversation talking about how in his team, he's really leaning into the idea that software engineering and data engineering are not that different. Zhamak has discussed this too. We should focus on sharing practices so we all create better software and infrastructure. For JGP, data engineering work in most organizations has followed a very waterfall approach. However, his team has been mostly working in an Agile manner. Therefore it wasn't a huge switch to their ways of working - like it is at many organizations - once they started doing data mesh. And luckily, there was already an appetite for changing the way they were tackling data challenges.
In the spirit of being agile and capital A Agile as well, PayPal set out on their data mesh journey. They wanted to do an MVP but what was the P? Minimum Viable Data Product/Quantum? Minimum Viable Platform? Both? Minimum Viable Mesh? JGP recommends looking at what you want to deliver as a minimum unit of value. PayPal already had extensive data platform expertise so they were able to focus on delivering data products/quanta (plural of data quantum) but they worked in parallel to build out their initial data quantum and mesh capabilities. As many guests have noted, it's dangerous to only do a minimum viable data product/quanta.
PayPal has been building data platforms for a long time. As mentioned by JGP, they were one of the pioneers of the self-service data platform concept. But data mesh offered a path to faster and easier data discovery, to making it easier to use data in a governed way, and to increased trust in data by the data consumers - their first consumers being data scientists. A big benefit of addressing those needs is those data scientists are able to better tell if the data they access is the right data for their use case.
One thing JGP emphasized that's significantly helping PayPal move forward is standardizing APIs across data quanta. Those are not data access - or analytical - APIs as JGP thinks those will just never work all that well. Instead, as their audience is data scientists only to start, everything anyone needs other than the actual 1s and 0s of the data is accessible via Python APIs. The metadata, the observability/trust data, etc. Then, the data scientists use notebooks to work with the data. But standard APIs means data consumers only have to learn one interface. This is similar in concept to what many are doing with data marketplaces - one standardized way to interact with the information about the data quanta.
PayPal is using the terms data product and data quantum as two separate things. A data product is simply a product powered by data and analytics. Those have been around for quite some time. But PayPal is looking at data quanta like side cars, used specifically to power more and more of their data products going forward.
PayPal have invested heavily in making data contracts work well per JGP and earlier PayPal guest Jay Sen. They've been building APIs to make it far easier to consume data contracts as people learn more about a data quantum. And as mentioned before, they can consume observability metrics via API as well. When asked about how are they setting their actual contractual terms, the data producers initially put out some contractual terms and then may adjust those terms as data consumers request. It's important for data producers to not set their data contract obligations too strictly unless there is a user-based need.
JGP made the good and often unspoken point: the term domain has lost a lot of its meaning. It can mean a very high-level domain like Marketing, Finance, Sales, or HR. Even in software companies, a domain could be Product. But at PayPal, they are being quite strict about what they mean for domain in data mesh: it is a small scale sub-domain - think two pizza team size - and they enforcing a strict 1:1 relationship of one data quantum per domain; and of course, not cross domain source data quanta too. That way, each small domain can focus on creating a great data quantum instead of worrying too much about how big each data quantum should be. The scope should never get that huge at a two pizza team size.
Back to APIs, PayPal is implementing an API-first approach. APIs for the data quantum control plane, observability APIs, and data discovery APIs. It's the preferred way of working for their initial consumers - data scientists. However, as mentioned previously, JGP does not believe analytical APIs - that is APIs designed to do things like filtering and returning many hundreds to thousands or more results - are really feasible. Definitely not now and possibly ever. So APIs are great for getting at the metadata but not the data for analytical use in his view.
JGP wrapped up in sharing how our tooling must evolve so we don't have to think about such a hard wall between analytical and operational. There will always be analytical and operational workloads but our systems can evolve to support both. We aren't there yet though.
Quick tidbit:
If you are just delivering data, the 1s and 0s, you are not delivering the necessary trust.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode