How Orchestration Impacts Data Platform Architecture

40 snips

Dec 16, 2024

Hugo Lu, CEO and co-founder of Orchestra, delves into the vital role of data orchestration in platform architecture. He highlights how the choice of orchestration engines influences data flow management and overall efficiency. The discussion covers the evolution of orchestration from early models to modern applications like Kubernetes, reveals the challenges of traditional systems, and emphasizes the need for flexibility in architecture. Lu also addresses the distinct demands of analytical versus product-oriented applications, especially with the rise of AI integration.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Orchestration's Importance

Orchestration tools are crucial for triggering, scheduling, and monitoring data processes within a platform.
Their importance grows with platform complexity, as data workflows become distributed across various services.

ANECDOTE

Airflow Implementation Challenges

A fast-growing tech company spent two years setting up Airflow and parameterizing it for broader use.
Despite having backend engineers as end-users, the setup is complex, requiring significant platform engineer resources and ongoing catalog maintenance.

ADVICE

Tool Selection

Focus on leveraging tools for their strengths to optimize cost and data delivery speed.
Avoid forcing all processes through a single tool like dbt if it's not the best fit for every task.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Summary
The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his thoughts as the founder of an orchestration company on how to think about data orchestration and data platform design as we navigate the current era of data engineering.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
It’s 2024, why are we still doing data migrations by hand? Teams spend months—sometimes years—manually converting queries and validating data, burning resources and crushing morale. Datafold's AI-powered Migration Agent brings migrations into the modern era. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today to learn how Datafold can automate your migration and ensure source to target parity.
As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world, from big picture questions like AI governance and data sharing to more nuanced questions like, how do we balance offense and defense in data management? In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.
Your host is Tobias Macey and today I'm interviewing Hugo Lu about the data platform and orchestration ecosystem and how to navigate the available options

Interview

Introduction
How did you get involved in building data platforms?
Can you describe what an orchestrator is in the context of data platforms?
- There are many other contexts in which orchestration is necessary. What are some examples of how orchestrators have adapted (or failed to adapt) to the times?
What are the core features that are necessary for an orchestrator to have when dealing with data-oriented workflows?
Beyond the bare necessities, what are some of the other features and design considerations that go into building a first-class dat platform or orchestration system?
There have been several generations of orchestration engines over the past several years. How would you characterize the different coarse groupings of orchestration engines across those generational boundaries?
How do the characteristics of a data orchestrator influence the overarching architecture of an organization's data platform/data operations?
- What about the reverse?
How have the cycles of ML and AI workflow requirements impacted the design requirements for data orchestrators?
What are the most interesting, innovative, or unexpected ways that you have seen data orchestrators used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on data orchestration?
When is an orchestrator the wrong choice?
What are your predictions and/or hopes for the future of data orchestration?

Contact Info

Parting Question

From your perspective, what is the biggest thing data teams are missing in the technology today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA