AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Data Governance and Accessibility
This chapter explores the concept of data governance, including access control, understanding data, and data discoverability. It also highlights the importance of standardizing APIs for consistent user experience and the accessibility of data for a wider audience.
Due to health-related issues, we are on a temporary hiatus for new episodes. Please enjoy this rerelease of episode #130 with my partner in our weekly data mesh roundtables Jean-Georges Perrin. There are a lot of interesting things to take away from this. A biggie is to have an early thesis about what to drive towards - what will drive value early? Doing data mesh doesn't simply create value. And you need to build momentum. There's a lot here to learn about how to apply good software engineering practices to data with data mesh.
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Data Mesh at PayPal blog post: https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522
JGP's All Things Open talk (free virtual registration): https://2022.allthingsopen.org/sessions/building-a-data-mesh-with-open-source-technologies/
JGP's LinkedIn: https://www.linkedin.com/in/jgperrin/
JGP's Twitter: @jgperrin / https://twitter.com/jgperrin
JGP's YouTube: https://www.youtube.com/c/JeanGeorgesPerrin
JGP's Website: https://jgp.ai/
In this episode, Scott interviewed Jean-Georges Perrin AKA JGP, Intelligence Platform Lead at PayPal. JGP is probably the first guest to lean into using "data quantum" instead of "data product". JGP did want to emphasize that as of now, he was only discussing the implementation for his team the GCSC IA (Global Credit Risk, Seller Risk, Collections Intelligence Automation) within PayPal.
Some key takeaways/thoughts from JGP's point of view:
JGP started the conversation talking about how in his team, he's really leaning into the idea that software engineering and data engineering are not that different. Zhamak has discussed this too. We should focus on sharing practices so we all create better software and infrastructure. For JGP, data engineering work in most organizations has followed a very waterfall approach. However, his team has been mostly working in an Agile manner. Therefore it wasn't a huge switch to their ways of working - like it is at many organizations - once they started doing data mesh. And luckily, there was already an appetite for changing the way they were tackling data challenges.
In the spirit of being agile and capital A Agile as well, PayPal set out on their data mesh journey. They wanted to do an MVP but what was the P? Minimum Viable Data Product/Quantum? Minimum Viable Platform? Both? Minimum Viable Mesh? JGP recommends looking at what you want to deliver as a minimum unit of value. PayPal already had extensive data platform expertise so they were able to focus on delivering data products/quanta (plural of data quantum) but they worked in parallel to build out their initial data quantum and mesh capabilities. As many guests have noted, it's dangerous to only do a minimum viable data product/quanta.
PayPal has been building data platforms for a long time. As mentioned by JGP, they were one of the pioneers of the self-service data platform concept. But data mesh offered a path to faster and easier data discovery, to making it easier to use data in a governed way, and to increased trust in data by the data consumers - their first consumers being data scientists. A big benefit of addressing those needs is those data scientists are able to better tell if the data they access is the right data for their use case.
One thing JGP emphasized that's significantly helping PayPal move forward is standardizing APIs across data quanta. Those are not data access - or analytical - APIs as JGP thinks those will just never work all that well. Instead, as their audience is data scientists only to start, everything anyone needs other than the actual 1s and 0s of the data is accessible via Python APIs. The metadata, the observability/trust data, etc. Then, the data scientists use notebooks to work with the data. But standard APIs means data consumers only have to learn one interface. This is similar in concept to what many are doing with data marketplaces - one standardized way to interact with the information about the data quanta.
PayPal is using the terms data product and data quantum as two separate things. A data product is simply a product powered by data and analytics. Those have been around for quite some time. But PayPal is looking at data quanta like side cars, used specifically to power more and more of their data products going forward.
PayPal have invested heavily in making data contracts work well per JGP and earlier PayPal guest Jay Sen. They've been building APIs to make it far easier to consume data contracts as people learn more about a data quantum. And as mentioned before, they can consume observability metrics via API as well. When asked about how are they setting their actual contractual terms, the data producers initially put out some contractual terms and then may adjust those terms as data consumers request. It's important for data producers to not set their data contract obligations too strictly unless there is a user-based need.
JGP made the good and often unspoken point: the term domain has lost a lot of its meaning. It can mean a very high-level domain like Marketing, Finance, Sales, or HR. Even in software companies, a domain could be Product. But at PayPal, they are being quite strict about what they mean for domain in data mesh: it is a small scale sub-domain - think two pizza team size - and they enforcing a strict 1:1 relationship of one data quantum per domain; and of course, not cross domain source data quanta too. That way, each small domain can focus on creating a great data quantum instead of worrying too much about how big each data quantum should be. The scope should never get that huge at a two pizza team size.
Back to APIs, PayPal is implementing an API-first approach. APIs for the data quantum control plane, observability APIs, and data discovery APIs. It's the preferred way of working for their initial consumers - data scientists. However, as mentioned previously, JGP does not believe analytical APIs - that is APIs designed to do things like filtering and returning many hundreds to thousands or more results - are really feasible. Definitely not now and possibly ever. So APIs are great for getting at the metadata but not the data for analytical use in his view.
JGP wrapped up in sharing how our tooling must evolve so we don't have to think about such a hard wall between analytical and operational. There will always be analytical and operational workloads but our systems can evolve to support both. We aren't there yet though.
Quick tidbit:
If you are just delivering data, the 1s and 0s, you are not delivering the necessary trust.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode