Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
45 snips
Dec 1, 2024 • 60min

The Art of Database Selection and Evolution

Sam Kleinman, a seasoned software engineer with experience at MongoDB, dives deep into the art of database selection. He discusses the critical trade-offs in database architectures and how they shape system design. Sam warns against the pitfalls of over-engineering and stresses leveraging database capabilities rather than pushing logic to the application layer. He identifies a significant gap in effective testing tools for database interactions, advocating for improved paradigms to ensure reliability. This insightful conversation blends technical expertise with practical advice for modern data management.
undefined
16 snips
Nov 26, 2024 • 45min

Bridging Code and UI in Data Orchestration with Kestra

Anna Geller, Product Lead at Kestra and former data engineer at KPMG, dives into the fascinating realm of data orchestration. She explains how Kestra bridges the gap between coding and user interfaces, advocating for a hybrid low-code approach. Anna highlights the limitations of existing tools and how Kestra’s API-first design and scalable architecture tackle these challenges. The conversation also touches on the complexities of managing workflows, the role of real-time data, and the innovative functionalities that empower both technical and non-technical users.
undefined
36 snips
Nov 18, 2024 • 40min

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.
undefined
8 snips
Nov 11, 2024 • 56min

An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

Burak Karakan, co-founder of Bruin and a seasoned software engineer, discusses the advantages of a code-only approach to data workflows. He emphasizes how a unified data management tool can simplify analytics for mobile gaming companies. Burak details Bruin's open-source architecture, which allows small teams to efficiently manage their data sources. He also covers the evolution of the Bruin toolchain and its role in enhancing collaboration. Finally, he stresses the need for improved data quality and accessibility in analytical systems.
undefined
25 snips
Nov 4, 2024 • 48min

Feldera: Bridging Batch and Streaming with Incremental Computation

Leonid Ryzhyk, CTO of Feldera, along with CEO Lalith Suresh and Chief Science Officer Mihai Budiu, dive into the world of incremental computation. They discuss how Feldera bridges batch and streaming data seamlessly, revolutionizing real-time machine learning applications like fraud detection. The trio highlights the architecture's evolution, emphasizing historical data analysis and feature engineering. They also tackle the skepticism surrounding traditional streaming technologies and explore Feldera's potential in both the open-source community and enterprise solutions.
undefined
19 snips
Oct 27, 2024 • 49min

Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent

Gleb Mezhanskiy, CEO and co-founder of DataFold, shares his extensive experience in data management from his time at Autodesk and Lyft. He dives into the complexities of data migrations, detailing challenges like technical debt and the need for effective parity between systems. Gleb reveals how DataFold leverages AI to automate data migration processes, significantly reducing time and effort. He also discusses the importance of monitoring data integrity in real-time and offers insights into choosing the right models for secure data handling.
undefined
4 snips
Oct 20, 2024 • 58min

Bring Vector Search And Storage To The Data Lake With Lance

Weston Pace, a software engineer at LanceDB and contributor to Arrow, discusses the intersection of vector databases and AI. He highlights how Lance integrates seamlessly with data lakes, offering fast access and efficient schema evolution. The focus on the Lance file format showcases its advantages over traditional storage methods, particularly for multimedia tasks. Weston elaborates on optimizing latency in AI applications and the importance of user-friendly tools in the evolving vector store ecosystem.
undefined
23 snips
Oct 13, 2024 • 54min

The Role of Python in Shaping the Future of Data Platforms with DLT

Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, share their insights on the transformative role of Python in data platforms. They discuss DLT as a versatile library integrating with lakehouses and AI frameworks. The duo highlights high-performance libraries like PyArrow's impact on metadata management and parallel processing. They also explore the significance of interoperability and evolving governance challenges in data ingestion. Exciting plans for a portable data lake promise to enhance user access and experience in data management.
undefined
12 snips
Oct 6, 2024 • 43min

Build Your Data Transformations Faster And Safer With SDF

Lukas Schulte, Co-founder and CEO of SDF, dives into the revolutionary features of this SQL transformation tool designed for data privacy, governance, and quality. He discusses SDF's unique architecture built with Rust, enhancing both performance and reliability. Schulte explores the evolution of data transformation from static analysis to type-safe execution. He highlights the crucial role of classifiers in data governance and the ongoing development plans, including support for Python models, aimed at further improving developer workflows.
undefined
Sep 23, 2024 • 57min

Scaling Airbyte: Challenges and Milestones on the Road to 1.0

Michel Tricot, a key figure in the development of Airbyte, discusses the significant milestones leading to the platform's anticipated 1.0 launch. He shares insights on evolving from simplicity to sophisticated integrations while addressing industry shifts and user feedback. Michel delves into the challenges faced in scaling an open-source product and innovative applications of Airbyte technology, such as Cache warmup with Redis. He also highlights future enhancements, including improved operational support and the introduction of a Connector Marketplace.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app