Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
4 snips
Jan 13, 2025 • 55min

CSVs Will Never Die And OneSchema Is Counting On It

Andrew Luo, CEO of OneSchema, shares his expertise in data engineering and CRM migration, focusing on the enduring relevance of CSVs. He discusses the common challenges of CSV data, such as inconsistency and lack of standards, and explains how OneSchema uses AI for improved type validation and parsing. Andrew highlights OneSchema's potential to streamline data imports and boost efficiency, particularly for non-technical users. He also reveals plans for future innovations, including industry-specific transformation packs to enhance data management further.
undefined
34 snips
Jan 3, 2025 • 58min

Breaking Down Data Silos: AI and ML in Master Data Management

Dan Bruckner, Co-founder and CTO of Tamr and former CERN physicist, shares his insights into master data management (MDM) enhanced by AI and machine learning. He discusses his transition from physics to data science, highlighting challenges in reconciling large data sets. Dan explains how data silos form within organizations and emphasizes the role of large language models in improving user experience and data trust. He advocates for combining AI capabilities with human oversight to ensure accuracy while tackling complex data management issues.
undefined
57 snips
Dec 23, 2024 • 50min

Building a Data Vision Board: A Guide to Strategic Planning

Lior Barak, a data expert with 15 years of experience in data product strategy, shares invaluable insights on strategic planning in data management. He introduces the concept of a 'data vision board' as a tool for organizations to align their data strategies with regulatory and stakeholder needs. Lior emphasizes the importance of balancing immediate demands with long-term goals, quantifying data issues for prioritization, and maintaining a flexible, living strategic plan. His practical advice encourages data teams to transition from mere enablers to impactful creators.
undefined
33 snips
Dec 16, 2024 • 60min

How Orchestration Impacts Data Platform Architecture

Hugo Lu, CEO and co-founder of Orchestra, delves into the vital role of data orchestration in platform architecture. He highlights how the choice of orchestration engines influences data flow management and overall efficiency. The discussion covers the evolution of orchestration from early models to modern applications like Kubernetes, reveals the challenges of traditional systems, and emphasizes the need for flexibility in architecture. Lu also addresses the distinct demands of analytical versus product-oriented applications, especially with the rise of AI integration.
undefined
70 snips
Dec 8, 2024 • 52min

An Exploration Of The Impediments To Reusable Data Pipelines

Max Beauchemin, a data engineer with two decades of experience and founder of Preset, dives into the complexities of reusable data pipelines. He discusses the "write everything twice" problem, emphasizing the need for collaboration and shared reference implementations. Max explores the challenges of managing diverse SQL dialects and the evolving role of data engineers, likening it to front-end development. He envisions generative AI aiding knowledge distribution and encourages the community to engage in sharing templates to drive innovation in the field.
undefined
44 snips
Dec 1, 2024 • 60min

The Art of Database Selection and Evolution

Sam Kleinman, a seasoned software engineer with experience at MongoDB, dives deep into the art of database selection. He discusses the critical trade-offs in database architectures and how they shape system design. Sam warns against the pitfalls of over-engineering and stresses leveraging database capabilities rather than pushing logic to the application layer. He identifies a significant gap in effective testing tools for database interactions, advocating for improved paradigms to ensure reliability. This insightful conversation blends technical expertise with practical advice for modern data management.
undefined
15 snips
Nov 26, 2024 • 45min

Bridging Code and UI in Data Orchestration with Kestra

Anna Geller, Product Lead at Kestra and former data engineer at KPMG, dives into the fascinating realm of data orchestration. She explains how Kestra bridges the gap between coding and user interfaces, advocating for a hybrid low-code approach. Anna highlights the limitations of existing tools and how Kestra’s API-first design and scalable architecture tackle these challenges. The conversation also touches on the complexities of managing workflows, the role of real-time data, and the innovative functionalities that empower both technical and non-technical users.
undefined
26 snips
Nov 18, 2024 • 40min

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.
undefined
8 snips
Nov 11, 2024 • 56min

An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

Burak Karakan, co-founder of Bruin and a seasoned software engineer, discusses the advantages of a code-only approach to data workflows. He emphasizes how a unified data management tool can simplify analytics for mobile gaming companies. Burak details Bruin's open-source architecture, which allows small teams to efficiently manage their data sources. He also covers the evolution of the Bruin toolchain and its role in enhancing collaboration. Finally, he stresses the need for improved data quality and accessibility in analytical systems.
undefined
25 snips
Nov 4, 2024 • 48min

Feldera: Bridging Batch and Streaming with Incremental Computation

Leonid Ryzhyk, CTO of Feldera, along with CEO Lalith Suresh and Chief Science Officer Mihai Budiu, dive into the world of incremental computation. They discuss how Feldera bridges batch and streaming data seamlessly, revolutionizing real-time machine learning applications like fraud detection. The trio highlights the architecture's evolution, emphasizing historical data analysis and feature engineering. They also tackle the skepticism surrounding traditional streaming technologies and explore Feldera's potential in both the open-source community and enterprise solutions.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode