Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy
Mar 16, 2025
auto_awesome
Pete DeJoy, co-founder and product lead at Astronomer, shares his extensive experience with Airflow, discussing its evolution and upcoming enhancements in Airflow 3. He highlights Astronomer's commitment to improving data operations and community involvement. The conversation dives into the critical role of data observability through Astra Observe, innovative use cases like the Texas Rangers in-game analytics, and the shifting landscape of data engineering roles, emphasizing collaboration and advanced tooling in the modern data ecosystem.
Astronomer's deep involvement in the Airflow project, contributing over 60% of its code, underscores its pivotal role in fostering community engagement and project health.
The upcoming Airflow 3 release will enhance architectural capabilities, including remote task execution and multi-language support, to better address evolving data engineering demands.
As data engineering increasingly focuses on critical data products, Airflow has become essential for operational management, regulatory compliance, and ensuring data quality.
Deep dives
The Evolution of Astronomer and Airflow
Astronomer has evolved significantly since its inception in 2018, with a strong focus on Apache Airflow for data orchestration. The company transitioned from a data services role to one deeply intertwined with Airflow, as many of its initial projects were centered on building Airflow pipelines for clients. With over 60% of Airflow's code contributed by Astronomer and substantial involvement from its team in the project, the company positions itself as a pivotal player in the Airflow community. This partnership emphasizes a commitment to the project's health and community engagement, ensuring that Airflow continues to thrive.
Airflow's Growing Popularity and Ecosystem Position
Airflow has solidified its position as a leading tool in data orchestration, evidencing strong growth and adoption, as showcased by a significant increase in user engagement and downloads. Recent surveys indicate a burgeoning community, with Airflow being the most contributed Apache project to date. As new projects like Daxter and Prefect emerge, Airflow has become a benchmark for comparison, showcasing its influence within the modern data landscape. The upcoming Airflow 3 release is expected to bring enhanced capabilities aimed at evolving alongside changing data engineering needs, including better data awareness and improved abstractions for event-driven architectures.
Challenges in Data Engineering and the Role of Airflow
The role of data engineering has shifted dramatically, evolving from basic analytics to managing critical data products that affect business outcomes. This shift highlights the growing importance of Airflow as businesses now rely on it for operational data management, regulatory compliance, and machine learning workloads. With increased scrutiny on data quality and pipeline reliability, teams are recognizing the need for robust orchestration solutions. Thus, Airflow is being positioned not just as a tool, but as a vital component for maintaining overall data health and ensuring timely delivery of critical insights.
Innovations in Airflow 3 and Future Directions
Airflow 3 is poised to introduce significant enhancements, including architectural improvements that allow for remote task execution and multi-language support, catering to a wider range of users. The upcoming version aims to integrate increased data awareness, connecting workflows to the underlying tables and assets they manipulate. Moreover, these innovations will facilitate better scheduling and resource management across diverse infrastructure, meeting the evolving technical demands of data teams. The focus is on creating a more developer-friendly environment that broadens accessibility while ensuring that existing complexities do not obstruct usability.
Addressing Tooling Sprawl in the Data Ecosystem
The data ecosystem is currently grappling with tooling sprawl, which can complicate workflows and hinder efficiency for data teams. Astronomer is addressing this issue by promoting Airflow as a central orchestration solution that enables better integration across various data tools and services. With an emphasis on observability and operational capabilities, Astronomer seeks to consolidate disparate data processes into a unified platform that simplifies monitoring and management. As organizations strive to make sense of their complex data infrastructures, streamlining their tools will become crucial in enhancing productivity and performance.
Summary In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the critical role of Airflow in powering operational data products. He covers the evolution of Airflow, its position in the data ecosystem, and the challenges faced by data engineers, including infrastructure management and observability. The conversation also touches on the upcoming Airflow 3 release, which introduces data awareness, architectural improvements, and multi-language support, and Astronomer's observability suite, Astro Observe, which provides insights and proactive recommendations for Airflow users.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
Your host is Tobias Macey and today I'm interviewing Pete DeJoy about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3
Interview
Introduction
Can you describe what Astronomer is and the story behind it?
How would you characterize the relationship between Airflow and Astronomer?
Astronomer just released your State of Airflow 2025 Report yesterday and it is the largest data engineering survey ever with over 5,000 respondents. Can you talk a bit about top level findings in the report?
What about the overall growth of the Airflow project over time?
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.