
Data Engineering Podcast
Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab
Podcast summary created with Snipd AI
Quick takeaways
- Data Mesh Boost platform facilitates decentralized data engineering with automation and collaboration.
- Implementing data mesh principles involves metadata interoperability, data contracts, and emphasis on best practices and sustainability.
Deep dives
Building a Data Engineering Practice with Data Mesh Boost
Data Mesh Boost is a platform that helps organizations implement a data mesh and improve their data production practices. It provides templates to speed up building data products, computational governance to apply policies, and a marketplace for collaboration between data product teams. Observability is crucial for distributed orchestration of data transformations, and data contracts are used to define interoperability and avoid breaking changes. Data Mesh Boost enables a decentralized data engineering practice, focusing on automation, knowledge independence, and team autonomy.
Challenges and Shifts in Data Engineering Practices
Implementing data mesh principles is associated with transformations in data engineering practices. Metadata interoperability and data contract-first approaches are key elements. Customizable templates and scaffolding mechanisms help data product teams implement best practices. Computational governance ensures compliance and protects data quality. Collaboration and change management are facilitated through marketplace features. Overall, data mesh boosts a practice-centric mindset, prioritizing best practices and sustainability.
Technical Elements of Data Mesh Boost
Data Mesh Boost introduces technical elements to support data engineering activities. The Data Product Bill facilitates the implementation of standards and onboarding of new technologies. The Data Product Provisioner deploys data products as a single unit, applying computational policies and managing dependencies. The Data Product Marketplace enables data product discovery, collaboration, and change management. Overall, Data Mesh Boost provides customizable frameworks and solutions for implementing data mesh principles.
Building a Sustainable Data Mesh Practice
Establishing a sustainable data mesh practice requires reducing time to market, embracing automation, and ensuring data production scalability. Computational policies and data contracts are used to manage changes and maintain interoperability. Review processes, including code and data contract reviews, are crucial to prevent breaking changes and maintain consistency. Observability and observability platforms play a critical role in managing distributed orchestration. Data Mesh Boost supports the creation of a solid data engineering practice to facilitate successful data mesh implementation.
Summary
Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In this episode Paolo Platter shares the lessons they have learned in that process, the Data Mesh Boost platform that they have built to reduce some of the boilerplate required to make it successful, and some of the considerations to make when deciding if a data mesh is the right choice for you.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect.
- The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses.
- Your host is Tobias Macey and today I’m interviewing Paolo Platter about Agile Lab’s lessons learned through helping large enterprises establish their own data mesh
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you share your experiences working with data mesh implementations?
- What were the stated goals of project engagements that led to data mesh implementations?
- What are some examples of projects where you explored data mesh as an option and decided that it was a poor fit?
- What are some of the technical and process investments that are necessary to support a mesh strategy?
- When implementing a data mesh what are some of the common concerns/requirements for building and supporting data products?
- What are the general shape that a product will take in a mesh environment?
- What are the features that are necessary for a product to be an effective component in the mesh?
- What are some of the aspects of a data product that are unique to a given implementation?
- You built a platform for implementing data meshes. Can you describe the technical elements of that system?
- What were the primary goals that you were addressing when you decided to invest in building Data Mesh Boost?
- How does Data Mesh Boost help in the implementation of a data mesh?
- Code review is a common practice in construction and maintenance of software systems. How does that activity map to data systems/products?
- What are some of the challenges that you have encountered around CI/CD for data products?
- What are the persistent pain points involved in supporting pre-production validation of changes to data products?
- Beyond the initial work of building and deploying a data product there is the ongoing lifecycle management. How do you approach refactoring old data products to match updated practices/templates?
- What are some of the indicators that tell you when an organization is at a level of sophistication that can support a data mesh approach?
- What are the most interesting, innovative, or unexpected ways that you have seen Data Mesh Boost used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Data Mesh Boost?
- When is Data Mesh (Boost) the wrong choice?
- What do you have planned for the future of Data Mesh Boost?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- AgileLab
- Spark
- Cloudera
- Zhamak Dehghani
- Data Mesh
- Data Fabric
- Data Virtualization
- q-lang
- Data Mesh Boost
- Data Mesh Marketplace
- SourceGraph
- OpenMetadata
- Egeria
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA