Build Your Data Transformations Faster And Safer With SDF

12 snips

Oct 6, 2024

Lukas Schulte, Co-founder and CEO of SDF, dives into the revolutionary features of this SQL transformation tool designed for data privacy, governance, and quality. He discusses SDF's unique architecture built with Rust, enhancing both performance and reliability. Schulte explores the evolution of data transformation from static analysis to type-safe execution. He highlights the crucial role of classifiers in data governance and the ongoing development plans, including support for Python models, aimed at further improving developer workflows.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

SDF Origin Story

Lukas Schulte's work with sensor systems and ML models led to growing data management challenges.
These challenges, including user data, board meetings, and data privacy regulations, inspired the creation of SDF.

INSIGHT

Data Engineering Tooling Gap

Existing data transformation tools lack the robust static analysis and debugging capabilities of software engineering tools.
SDF aims to bring these advanced features to data engineering, improving pipeline reliability and developer experience.

INSIGHT

SDF's Focus on SQL Understanding

SDF differentiates itself by focusing on the engine and SQL understanding, rather than the authoring surface.
This approach allows SDF to work with various SQL dialects and existing dbt projects.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Summary
In this episode of the Data Engineering Podcast Lukas Schulte, co-founder and CEO of SDF, explores the development and capabilities of this fast and expressive SQL transformation tool. From its origins as a solution for addressing data privacy, governance, and quality concerns in modern data management, to its unique features like static analysis and type correctness, Lucas dives into what sets SDF apart from other tools like DBT and SQL Mesh. Tune in for insights on building a business around a developer tool, the importance of community and user experience in the data engineering ecosystem, and plans for future development, including supporting Python models and enhancing execution capabilities.
Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Imagine catching data issues before they snowball into bigger problems. That’s what Datafold’s new Monitors do. With automatic monitoring for cross-database data diffs, schema changes, key metrics, and custom data tests, you can catch discrepancies and anomalies in real time, right at the source. Whether it’s maintaining data integrity or preventing costly mistakes, Datafold Monitors give you the visibility and control you need to keep your entire data stack running smoothly. Want to stop issues before they hit production? Learn more at dataengineeringpodcast.com/datafold today!
Your host is Tobias Macey and today I'm interviewing Lukas Schulte about SDF, a fast and expressive SQL transformation tool that understands your schema

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what SDF is and the story behind it?
- What's the story behind the name?
What problem are you solving with SDF?
- dbt has been the dominant player for SQL-based transformations for several years, with other notable competition in the form of SQLMesh. Can you give an overview of the venn diagram for features and functionality across SDF, dbt and SQLMesh?
Can you describe the design and implementation of SDF?
- How have the scope and goals of the project changed since you first started working on it?
What does the development experience look like for a team working with SDF?
- How does that differ between the open and paid versions of the product?
What are the features and functionality that SDF offers to address intra- and inter-team collaboration?
One of the challenges for any second-mover technology with an established competitor is the adoption/migration path for teams who have already invested in the incumbent (dbt in this case). How are you addressing that barrier for SDF?
- Beyond the core migration path of the direct functionality of the incumbent product is the amount of tooling and communal knowledge that grows up around that product. How are you thinking about that aspect of the current landscape?
What is your governing principle for what capabilities are in the open core and which go in the paid product?
What are the most interesting, innovative, or unexpected ways that you have seen SDF used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on SDF?
When is SDF the wrong choice?
What do you have planned for the future of SDF?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA