158: The Orchestration Layer as the Data Platform Control Plane With Nick Schrock of Dagster Labs
Oct 4, 2023
auto_awesome
Nick Schrock, Founder of Dagster Labs, discusses his background in data engineering and the founding of Dagster Labs. They cover topics such as the evolution of data engineering, fragmentation in data infrastructure, the role of orchestration in data platforms, lessons learned from working with GraphQL, different orchestrators in the data infrastructure landscape, the role of MLOps in data engineering, and the future of data teams and orchestration.
DAGster Labs aims to revolutionize data infrastructure by providing a declarative and data-oriented approach to solving data engineering challenges.
The field of data orchestration tools includes a range of options, each with its own strengths and focus, and DAGster Labs sits between airflow and DBT, offering a flexible and accessible solution for data teams.
Deep dives
DAGster Labs: Building an Orchestration Tool for Data Infrastructure
DAGster Labs, founded by Nick Schrock, is an orchestration tool aimed at becoming a control plane for data infrastructure. With a background at Facebook and as one of the co-creators of GraphQL, Nick brings his expertise to the world of data engineering. DAGster Labs provides a declarative and data-oriented approach to solving data engineering challenges. The goal is to make data pipelines more efficient and productive for data and ML engineers, allowing for flexibility in computation and storage. DAGster Labs is focused on bringing value to centralized data teams, with upcoming features including embedded data quality and consumption management. The vision is to create a more integrated and efficient data ecosystem that leverages the power of orchestration in data infrastructure.
The Landscape of Data Orchestration Tools
The field of data orchestration tools includes a range of options, each with its own strengths and focus. Airflow, for example, offers a task-based DAG system with a Python interface and a user-friendly UI. DBT, on the other hand, specializes in ginger templating for SQL and is targeted towards software engineer analysts. Prefect and temporal provide more generic workflow engine capabilities, with the former being DAGless and highly imperative. DAGster Labs sits in between airflow and DBT, offering a declarative, data-oriented approach with flexibility for any computation. The goal is to make orchestration tools more accessible and valuable for data teams, integrating capabilities like data quality and consumption management.
The Convergence of Product Engineering and Data Engineering
Nick Schrock highlights the similarities and overlap between product engineering and data engineering. He emphasizes the need for convergence between the two disciplines, as they share common challenges and opportunities. The lessons learned from GraphQL's success in product engineering can be applied to data engineering, such as aligning concepts with users' day-to-day experiences and providing a toolkit for increased productivity. By understanding that developers are smart, busy individuals, teams can build tools that enhance their careers and deliver value. The future lies in a more integrated and cohesive data ecosystem, where data orchestration plays a crucial role in driving efficiency and productivity across both disciplines.
Unlocking the Future with Infinitely Abundant Energy
Beyond data engineering, Nick Schrock envisions a future in the energy transition. With solar energy becoming cheaper than fossil fuel alternatives, the world is on the verge of infinite, virtually free energy. The challenge lies in harnessing and integrating this abundance of energy into new industries and applications. By working on the energy transition, Nick sees opportunities to explore and develop new technologies and systems that leverage intermittent, renewable energy sources. This shift towards clean energy opens the door to advancements in various industries and paves the way for a sustainable and prosperous future.
The role of orchestration in data platforms (19:53)
The importance of operational tools for data pipelines (25:01)
Lessons learned from working with GraphQL (26:19)
The role of the orchestrator in data engineering (34:51)
The boundaries between data infrastructure and product engineering (37:33)
Different orchestrators in the data infrastructure landscape(42:03)
The role of MLOps in data engineering (46:04)
Data Quality and Orchestration (51:04)
Future of Data Teams and Orchestration (54:27)
Final thoughts and takeaways from (58:01)
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode