Data Engineering Podcast

Tobias Macey
undefined
22 snips
Mar 10, 2024 • 41min

Version Your Data Lakehouse Like Your Software With Nessie

Learn how the Nessie project bridges data lake and warehouse capabilities with versioning semantics similar to Git. Explore effective versioning and branching strategies, architecture of Nessie, and future development plans. Discover the advantages of using Nessie for data versioning in multi-table transactions within a data lakehouse setting.
undefined
14 snips
Mar 3, 2024 • 46min

When And How To Conduct An AI Program

Colleen Tartow shares insights on conducting AI programs, emphasizing clarity in vision and business goals. The episode covers challenges in AI implementation, importance of quality data, operational shifts, and transformative potential of AI in various fields. Strategies for integrating AI systems, simplifying data pipelines, and focusing on customer benefits are also discussed.
undefined
4 snips
Feb 25, 2024 • 56min

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Learn about the evolution of InfluxDB, the use of Apache Arrow, Flight, Datafusion, and Parquet to accelerate database engine development. Explore the challenges and advancements in time series data analysis, open source components, database engine stack, and technological developments in data management.
undefined
83 snips
Feb 18, 2024 • 59min

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Learn how Trino and Iceberg are revolutionizing the data lakehouse paradigm, combining the best of data lakes and warehouses. Hear about the challenges and advantages of using these technologies, and get insights on the future plans for the Trino platform.
undefined
7 snips
Feb 11, 2024 • 60min

Data Sharing Across Business And Platform Boundaries

Data sharing across business and platform boundaries is complex due to business rules, regulations, and technical considerations. Andrew Jefferson discusses building a robust system for data sharing, the techno-social considerations, and the Bobsled platform that aims to simplify the process. Topics include challenges of data sharing across cloud platforms, boundaries in data transfer systems, innovative applications of data sharing, shift left and shift right mentality, and the lack of AI and vector database solutions.
undefined
Feb 4, 2024 • 57min

Tackling Real Time Streaming Data With SQL Using RisingWave

The podcast discusses the RisingWave database engine for stream processing on S3, its architecture, challenges faced, and potential integration with a data lakehouse. It explores the use of Kafka for buffering and converting data formats, enhancing Postgres with real-time processing, and the differences in change data capture handling. The episode also covers workflow, onboarding, integration, and unexpected use cases of RisingWave in the manufacturing industry.
undefined
4 snips
Jan 29, 2024 • 1h 3min

Build A Data Lake For Your Security Logs With Scanner

Learn about Scanner, a fast querying platform for security log data. Discover the challenges of managing data lakes and the benefits of using a search index. Explore the design philosophies of the Scanner platform and its integration into security log analysis workflows. Understand the indexing strategies for variegated data and the importance of regulatory compliance and data security. Also, find out about the need for better visibility and queryability in data management.
undefined
7 snips
Jan 22, 2024 • 1h 2min

Modern Customer Data Platform Principles

This podcast explores the evolution of database technology, the role of Customer Data Platforms (CDPs) in the business data ecosystem, and architectural approaches to data management. The guest also discusses future plans for ActionIQ and the future of CDPs and data management technology.
undefined
6 snips
Jan 7, 2024 • 50min

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

The guest, Jignesh Patel, discusses his research on technical scalability and user experience improvements in data management. They explore the challenges of meeting data demand, the limitations of Moore's Law, efficient data retrieval and indexing, strategies for real-world context, and future problems and challenges in complex systems and data processing. The guest also highlights the importance of data discovery in data management technology.
undefined
6 snips
Jan 1, 2024 • 48min

Designing Data Platforms For Fintech Companies

CTO of fintech startup Monite discusses designing and implementing data platforms for the complexities of working with financial data. Topics include data challenges, regulatory requirements, managing backups, machine learning in fintech, reshaping accounting and customer support, and data governance challenges.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app