Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
60 snips
Dec 8, 2024 • 52min

An Exploration Of The Impediments To Reusable Data Pipelines

Max Beauchemin, a data engineer with two decades of experience and founder of Preset, dives into the complexities of reusable data pipelines. He discusses the "write everything twice" problem, emphasizing the need for collaboration and shared reference implementations. Max explores the challenges of managing diverse SQL dialects and the evolving role of data engineers, likening it to front-end development. He envisions generative AI aiding knowledge distribution and encourages the community to engage in sharing templates to drive innovation in the field.
undefined
35 snips
Dec 1, 2024 • 60min

The Art of Database Selection and Evolution

Sam Kleinman, a seasoned software engineer with experience at MongoDB, dives deep into the art of database selection. He discusses the critical trade-offs in database architectures and how they shape system design. Sam warns against the pitfalls of over-engineering and stresses leveraging database capabilities rather than pushing logic to the application layer. He identifies a significant gap in effective testing tools for database interactions, advocating for improved paradigms to ensure reliability. This insightful conversation blends technical expertise with practical advice for modern data management.
undefined
9 snips
Nov 26, 2024 • 45min

Bridging Code and UI in Data Orchestration with Kestra

Anna Geller, Product Lead at Kestra and former data engineer at KPMG, dives into the fascinating realm of data orchestration. She explains how Kestra bridges the gap between coding and user interfaces, advocating for a hybrid low-code approach. Anna highlights the limitations of existing tools and how Kestra’s API-first design and scalable architecture tackle these challenges. The conversation also touches on the complexities of managing workflows, the role of real-time data, and the innovative functionalities that empower both technical and non-technical users.
undefined
25 snips
Nov 18, 2024 • 40min

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.
undefined
Nov 11, 2024 • 56min

An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

Burak Karakan, co-founder of Bruin and a seasoned software engineer, discusses the advantages of a code-only approach to data workflows. He emphasizes how a unified data management tool can simplify analytics for mobile gaming companies. Burak details Bruin's open-source architecture, which allows small teams to efficiently manage their data sources. He also covers the evolution of the Bruin toolchain and its role in enhancing collaboration. Finally, he stresses the need for improved data quality and accessibility in analytical systems.
undefined
25 snips
Nov 4, 2024 • 48min

Feldera: Bridging Batch and Streaming with Incremental Computation

Leonid Ryzhyk, CTO of Feldera, along with CEO Lalith Suresh and Chief Science Officer Mihai Budiu, dive into the world of incremental computation. They discuss how Feldera bridges batch and streaming data seamlessly, revolutionizing real-time machine learning applications like fraud detection. The trio highlights the architecture's evolution, emphasizing historical data analysis and feature engineering. They also tackle the skepticism surrounding traditional streaming technologies and explore Feldera's potential in both the open-source community and enterprise solutions.
undefined
19 snips
Oct 27, 2024 • 49min

Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent

Gleb Mezhanskiy, CEO and co-founder of DataFold, shares his extensive experience in data management from his time at Autodesk and Lyft. He dives into the complexities of data migrations, detailing challenges like technical debt and the need for effective parity between systems. Gleb reveals how DataFold leverages AI to automate data migration processes, significantly reducing time and effort. He also discusses the importance of monitoring data integrity in real-time and offers insights into choosing the right models for secure data handling.
undefined
4 snips
Oct 20, 2024 • 58min

Bring Vector Search And Storage To The Data Lake With Lance

Weston Pace, a software engineer at LanceDB and contributor to Arrow, discusses the intersection of vector databases and AI. He highlights how Lance integrates seamlessly with data lakes, offering fast access and efficient schema evolution. The focus on the Lance file format showcases its advantages over traditional storage methods, particularly for multimedia tasks. Weston elaborates on optimizing latency in AI applications and the importance of user-friendly tools in the evolving vector store ecosystem.
undefined
6 snips
Oct 13, 2024 • 54min

The Role of Python in Shaping the Future of Data Platforms with DLT

Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, share their insights on the transformative role of Python in data platforms. They discuss DLT as a versatile library integrating with lakehouses and AI frameworks. The duo highlights high-performance libraries like PyArrow's impact on metadata management and parallel processing. They also explore the significance of interoperability and evolving governance challenges in data ingestion. Exciting plans for a portable data lake promise to enhance user access and experience in data management.
undefined
12 snips
Oct 6, 2024 • 43min

Build Your Data Transformations Faster And Safer With SDF

Lukas Schulte, Co-founder and CEO of SDF, dives into the revolutionary features of this SQL transformation tool designed for data privacy, governance, and quality. He discusses SDF's unique architecture built with Rust, enhancing both performance and reliability. Schulte explores the evolution of data transformation from static analysis to type-safe execution. He highlights the crucial role of classifiers in data governance and the ongoing development plans, including support for Python models, aimed at further improving developer workflows.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode