
Data Engineering Podcast
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Latest episodes

7 snips
Feb 11, 2024 • 60min
Data Sharing Across Business And Platform Boundaries
Data sharing across business and platform boundaries is complex due to business rules, regulations, and technical considerations. Andrew Jefferson discusses building a robust system for data sharing, the techno-social considerations, and the Bobsled platform that aims to simplify the process. Topics include challenges of data sharing across cloud platforms, boundaries in data transfer systems, innovative applications of data sharing, shift left and shift right mentality, and the lack of AI and vector database solutions.

Feb 4, 2024 • 57min
Tackling Real Time Streaming Data With SQL Using RisingWave
The podcast discusses the RisingWave database engine for stream processing on S3, its architecture, challenges faced, and potential integration with a data lakehouse. It explores the use of Kafka for buffering and converting data formats, enhancing Postgres with real-time processing, and the differences in change data capture handling. The episode also covers workflow, onboarding, integration, and unexpected use cases of RisingWave in the manufacturing industry.

4 snips
Jan 29, 2024 • 1h 3min
Build A Data Lake For Your Security Logs With Scanner
Learn about Scanner, a fast querying platform for security log data. Discover the challenges of managing data lakes and the benefits of using a search index. Explore the design philosophies of the Scanner platform and its integration into security log analysis workflows. Understand the indexing strategies for variegated data and the importance of regulatory compliance and data security. Also, find out about the need for better visibility and queryability in data management.

7 snips
Jan 22, 2024 • 1h 2min
Modern Customer Data Platform Principles
This podcast explores the evolution of database technology, the role of Customer Data Platforms (CDPs) in the business data ecosystem, and architectural approaches to data management. The guest also discusses future plans for ActionIQ and the future of CDPs and data management technology.

6 snips
Jan 7, 2024 • 50min
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel
The guest, Jignesh Patel, discusses his research on technical scalability and user experience improvements in data management. They explore the challenges of meeting data demand, the limitations of Moore's Law, efficient data retrieval and indexing, strategies for real-world context, and future problems and challenges in complex systems and data processing. The guest also highlights the importance of data discovery in data management technology.

6 snips
Jan 1, 2024 • 48min
Designing Data Platforms For Fintech Companies
CTO of fintech startup Monite discusses designing and implementing data platforms for the complexities of working with financial data. Topics include data challenges, regulatory requirements, managing backups, machine learning in fintech, reshaping accounting and customer support, and data governance challenges.

Dec 24, 2023 • 1h 15min
Troubleshooting Kafka In Production
Elad Eldor, author of 'Kafka: Troubleshooting in Production', discusses the challenges of operating Kafka at scale and ways to mitigate potential issues. Topics include the importance of Kafka in the data pipeline, doubling retention in Kafka, managed vs. self-managed Kafka clusters, data lake complexity, monitoring for Kafka, troubleshooting unreplicated partitions, the cost of running Kafka in the cloud, and the need for a correlation tool.

Dec 18, 2023 • 56min
Adding An Easy Mode For The Modern Data Stack With 5X
The podcast discusses the challenges of the modern data stack and how 5X is pre-integrating the best tools from each layer to solve these issues. It explores the need for a centralized control plane, strategic investments in data capabilities, and the benefits of consolidating to a single solution. The speaker also shares insights on platform building, simplifying user experience, and lessons learned in introducing a new product category.

Dec 11, 2023 • 51min
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack
The podcast discusses the Anomstack project and its goal of providing simple anomaly detection. They explore the definition and prioritization of metrics, implementation and optimization of AMSTAC, and extending the project's capabilities. They also touch on data lakes, Starburst analytics, and challenges in data management.

Dec 4, 2023 • 1h 4min
Designing Data Transfer Systems That Scale
In this podcast, Andrei Tserakhau discusses the challenges and solutions in data transfer and connectors, handling incremental data transfer and deduplication, building a user-friendly data transfer service, building scalable data engineering systems, building a data transfer product with change data capture and air-bike connectors, and challenges and gaps in data management tooling and technology.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.