

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

31 snips
Oct 23, 2023 • 1h 4min
Defining A Strategy For Your Data Products
Ranjith Raghunath shares his thoughts on building a strategy for data products, including centralizing vs decentralizing data product strategy, managing technical debt, and the importance of metrics in data product strategy.

8 snips
Oct 15, 2023 • 1h 8min
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable
This podcast episode discusses the challenges of building and maintaining stream processing infrastructure. It explores the evolution of streaming systems and the role of platforms like Decodable in simplifying stream processing. The speakers also delve into the challenges of stream processing applications, different ways to interact with Decodable, the importance of 'glue' in stream processing, and the biggest gap in data management tooling and technology.

Oct 9, 2023 • 52min
Using Data To Illuminate The Intentionally Opaque Insurance Industry
Max Cho, Founder of a business to make policy selection more navigable, discusses the challenges of navigating the opaque insurance industry. Topics include data collection and analysis, automating a manual industry, insurance pricing transparency, challenges of AI navigation, data preprocessing for analysis, understanding policy complexities, and the utility of large language models in the insurance industry.

21 snips
Oct 1, 2023 • 52min
Building ETL Pipelines With Generative AI
AI's impact on ETL processes, using generative AI for unstructured data, AI's role in ETL pipelines, experimenting with AI models, evolving role of AI assistants in data engineering, considerations and challenges of using AI in ETL pipelines, changing landscape of ETL tools

Sep 25, 2023 • 59min
Powering Vector Search With Real Time And Incremental Vector Indexes
This podcast discusses the growth of machine learning and the need for vector search capabilities. They explore the challenges of real-time indexes, the benefits of semantic search, and incorporating vector search into data flows. They also cover the considerations and limitations of vector search and share insights on working with vector databases.

103 snips
Sep 17, 2023 • 1h 2min
Building Linked Data Products With JSON-LD
In this podcast, Brian Platz discusses the concept and implications of linked data, the benefits of using JSON-LD for building semantic data products, the challenges faced in building linked data products, and the need for improved data management tools.

25 snips
Sep 10, 2023 • 1h 1min
An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem
Nick Schrock, creator of Dagster, discusses the state of data orchestration technology and its application. They explore the challenges and benefits of orchestrators, the balance between information and infrastructure, and the capabilities and challenges of data orchestration. They also discuss low code and no code solutions in data work, their integration into software engineering, and the role of data orchestration in ML workflows.

15 snips
Sep 4, 2023 • 42min
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library
The podcast explores the dlt project, an open source Python library for data loading. It discusses the challenges in data integration, the benefits of dlt over other tools, and how to start building pipelines. Other topics include the journey of becoming a data engineer, performance considerations of using Python, collaboration in data integration, and integration with different runtimes. The hosts emphasize the need for better education in data management and practical solutions.

Aug 28, 2023 • 1h 1min
Building An Internal Database As A Service Platform At Cloudflare
This podcast explores how Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. They discuss challenges in maintaining high uptime and managing data volume, scaling considerations and load balancing strategies, the evolvement of database engines, differences in version upgrades between Postgres and MySQL, innovative usage and challenges in building a database platform at Cloudflare, and lessons learned in building their system.

Aug 20, 2023 • 55min
Harnessing Generative AI For Creating Educational Content With Illumidesk
Generative AI in educational content creation, building a data-driven experience for learners, challenges of dealing with large amounts of data, analyzing learner interactions and improving content development, data normalization and personalized learning paths, implementation and architecture of Illumidesk platform, evolution of platform and incorporating LLM framework into data engineering pipeline, application and usage of Illumidesk for content creation.