
Data Engineering Podcast
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Latest episodes

4 snips
Apr 21, 2024 • 54min
Making Email Better With AI At Shortwave
Andrew Lee, Founder of Shortwave, discusses integrating AI into email to boost productivity. He shares technical challenges, benefits, and features of the product. Topics include challenges of email, optimizing AI models, embedding models, email synchronization, and transitioning to an email-focused product.

5 snips
Apr 14, 2024 • 1h 16min
Designing A Non-Relational Database Engine
Oren Eini, CEO of RavenDB, discusses designing a non-relational database engine, comparing it to relational engines. Topics include key design considerations, data modeling approaches, performance differences, and the importance of transactions. They also explore the influence of generative AI on the database market and vector search functionalities, emphasizing simplicity, operational ease, and distributed architecture considerations in database engine design.

9 snips
Apr 7, 2024 • 56min
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer
Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer in data platforms. Topics include challenges in defining metrics, implementing a semantic layer, transitioning from DBT to Cube, and the evolution of CubeJS to Rust. The episode also explores AI-driven data discovery tools for business consumers.

5 snips
Mar 31, 2024 • 51min
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary
Exploring the importance of observability in dbt projects, with focus on enhancing testing capabilities and anomaly detection. The conversation delves into the challenges faced by data engineers in building trust in data accuracy and the approach taken by Elementary to embed observability into the workflow.

23 snips
Mar 24, 2024 • 56min
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+
Discover how Dagster+ enhances data orchestration with declarative workflows, reducing burden on data teams and enabling collaboration. Learn about the evolution of asset-oriented orchestration, data mesh concepts, and diverse industry use cases. Dive into the sustainable approach to Daxter Plus launch and considerations for choosing Dagster+ in data orchestration.

Mar 17, 2024 • 58min
Reconciling The Data In Your Databases With Datafold
The podcast delves into data reconciliation in databases, discussing error conditions and solutions to ensure data accuracy. Topics include challenges in data management, techniques for maintaining data quality, navigating reconciliation in warehouse migration projects, and strategies for cost management and data optimization. The innovative uses of Datafold and Data Diff utility in various sectors, intersection of data engineering and AI applications, and advancements in tooling support for data engineers are also explored.

22 snips
Mar 10, 2024 • 41min
Version Your Data Lakehouse Like Your Software With Nessie
Learn how the Nessie project bridges data lake and warehouse capabilities with versioning semantics similar to Git. Explore effective versioning and branching strategies, architecture of Nessie, and future development plans. Discover the advantages of using Nessie for data versioning in multi-table transactions within a data lakehouse setting.

14 snips
Mar 3, 2024 • 46min
When And How To Conduct An AI Program
Colleen Tartow shares insights on conducting AI programs, emphasizing clarity in vision and business goals. The episode covers challenges in AI implementation, importance of quality data, operational shifts, and transformative potential of AI in various fields. Strategies for integrating AI systems, simplifying data pipelines, and focusing on customer benefits are also discussed.

4 snips
Feb 25, 2024 • 56min
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development
Learn about the evolution of InfluxDB, the use of Apache Arrow, Flight, Datafusion, and Parquet to accelerate database engine development. Explore the challenges and advancements in time series data analysis, open source components, database engine stack, and technological developments in data management.

83 snips
Feb 18, 2024 • 59min
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse
Learn how Trino and Iceberg are revolutionizing the data lakehouse paradigm, combining the best of data lakes and warehouses. Hear about the challenges and advantages of using these technologies, and get insights on the future plans for the Trino platform.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.