Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
63 snips
Dec 23, 2024 • 50min

Building a Data Vision Board: A Guide to Strategic Planning

Lior Barak, a data expert with 15 years of experience in data product strategy, shares invaluable insights on strategic planning in data management. He introduces the concept of a 'data vision board' as a tool for organizations to align their data strategies with regulatory and stakeholder needs. Lior emphasizes the importance of balancing immediate demands with long-term goals, quantifying data issues for prioritization, and maintaining a flexible, living strategic plan. His practical advice encourages data teams to transition from mere enablers to impactful creators.
undefined
40 snips
Dec 16, 2024 • 60min

How Orchestration Impacts Data Platform Architecture

Hugo Lu, CEO and co-founder of Orchestra, delves into the vital role of data orchestration in platform architecture. He highlights how the choice of orchestration engines influences data flow management and overall efficiency. The discussion covers the evolution of orchestration from early models to modern applications like Kubernetes, reveals the challenges of traditional systems, and emphasizes the need for flexibility in architecture. Lu also addresses the distinct demands of analytical versus product-oriented applications, especially with the rise of AI integration.
undefined
70 snips
Dec 8, 2024 • 52min

An Exploration Of The Impediments To Reusable Data Pipelines

Max Beauchemin, a data engineer with two decades of experience and founder of Preset, dives into the complexities of reusable data pipelines. He discusses the "write everything twice" problem, emphasizing the need for collaboration and shared reference implementations. Max explores the challenges of managing diverse SQL dialects and the evolving role of data engineers, likening it to front-end development. He envisions generative AI aiding knowledge distribution and encourages the community to engage in sharing templates to drive innovation in the field.
undefined
45 snips
Dec 1, 2024 • 60min

The Art of Database Selection and Evolution

Sam Kleinman, a seasoned software engineer with experience at MongoDB, dives deep into the art of database selection. He discusses the critical trade-offs in database architectures and how they shape system design. Sam warns against the pitfalls of over-engineering and stresses leveraging database capabilities rather than pushing logic to the application layer. He identifies a significant gap in effective testing tools for database interactions, advocating for improved paradigms to ensure reliability. This insightful conversation blends technical expertise with practical advice for modern data management.
undefined
16 snips
Nov 26, 2024 • 45min

Bridging Code and UI in Data Orchestration with Kestra

Anna Geller, Product Lead at Kestra and former data engineer at KPMG, dives into the fascinating realm of data orchestration. She explains how Kestra bridges the gap between coding and user interfaces, advocating for a hybrid low-code approach. Anna highlights the limitations of existing tools and how Kestra’s API-first design and scalable architecture tackle these challenges. The conversation also touches on the complexities of managing workflows, the role of real-time data, and the innovative functionalities that empower both technical and non-technical users.
undefined
26 snips
Nov 18, 2024 • 40min

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.
undefined
8 snips
Nov 11, 2024 • 56min

An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

Burak Karakan, co-founder of Bruin and a seasoned software engineer, discusses the advantages of a code-only approach to data workflows. He emphasizes how a unified data management tool can simplify analytics for mobile gaming companies. Burak details Bruin's open-source architecture, which allows small teams to efficiently manage their data sources. He also covers the evolution of the Bruin toolchain and its role in enhancing collaboration. Finally, he stresses the need for improved data quality and accessibility in analytical systems.
undefined
25 snips
Nov 4, 2024 • 48min

Feldera: Bridging Batch and Streaming with Incremental Computation

Leonid Ryzhyk, CTO of Feldera, along with CEO Lalith Suresh and Chief Science Officer Mihai Budiu, dive into the world of incremental computation. They discuss how Feldera bridges batch and streaming data seamlessly, revolutionizing real-time machine learning applications like fraud detection. The trio highlights the architecture's evolution, emphasizing historical data analysis and feature engineering. They also tackle the skepticism surrounding traditional streaming technologies and explore Feldera's potential in both the open-source community and enterprise solutions.
undefined
19 snips
Oct 27, 2024 • 49min

Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent

Gleb Mezhanskiy, CEO and co-founder of DataFold, shares his extensive experience in data management from his time at Autodesk and Lyft. He dives into the complexities of data migrations, detailing challenges like technical debt and the need for effective parity between systems. Gleb reveals how DataFold leverages AI to automate data migration processes, significantly reducing time and effort. He also discusses the importance of monitoring data integrity in real-time and offers insights into choosing the right models for secure data handling.
undefined
4 snips
Oct 20, 2024 • 58min

Bring Vector Search And Storage To The Data Lake With Lance

Weston Pace, a software engineer at LanceDB and contributor to Arrow, discusses the intersection of vector databases and AI. He highlights how Lance integrates seamlessly with data lakes, offering fast access and efficient schema evolution. The focus on the Lance file format showcases its advantages over traditional storage methods, particularly for multimedia tasks. Weston elaborates on optimizing latency in AI applications and the importance of user-friendly tools in the evolving vector store ecosystem.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner