Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
Jun 11, 2025 • 44min

AI and the Lakehouse: How Starburst is Pioneering New Workflows

Alex Albu, tech lead for AI initiatives at Starburst, dives into the fascinating world of integrating AI workloads with lakehouse architecture. He shares his journey from software engineering to championing AI enhancements at Starburst. The discussion covers innovative solutions like AI agents for data exploration and metadata enrichment. Alex addresses the hurdles of marrying AI with traditional data systems and reveals future visions for improved data formats and AI-driven tools, promising a revolution in data management.
undefined
49 snips
Jun 3, 2025 • 1h 1min

Amazon S3: The Backbone of Modern Data Systems

Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, reveals the remarkable evolution of Amazon S3, a key data repository since 2006. She shares fascinating insights on how S3 transformed from basic storage into the backbone of modern data architecture, enabling scalable data lakes. Discussion includes the importance of metadata, the integration of S3 with Iceberg, and innovations like strong consistency. Companies like Adobe and Netflix leverage S3 for efficiency, showcasing its role in navigating both structured and unstructured data challenges.
undefined
22 snips
May 29, 2025 • 42min

Scaling Data Operations With Platform Engineering

Chakravarti Kotaru, Director of a data platform at a leading online travel company with nearly two decades in scalable architectures, shares his expertise on data operations. He discusses the evolution from DevOps to platform engineering, emphasizing centralized management and automation with AWS Service Catalog. Kotaru elaborates on the challenges of database migrations, the integration of AI and ML for enhanced efficiency, and the vital role of organizational buy-in in successful data initiatives. A fascinating glimpse into modern data platform strategies!
undefined
43 snips
May 21, 2025 • 50min

From Data Discovery to AI: The Evolution of Semantic Layers

Shinji Kim, Founder and CEO of SelectStar, shares insights on the evolving role of semantic layers in AI. He discusses the journey from statistical analysis to data governance, highlighting challenges enterprises face with data access. The conversation covers the shift from centralized to decentralized data teams and the importance of metadata management. Shinji emphasizes the critical role of semantic modeling for business intelligence and how AI can enhance data accuracy. He also explores the future of semantic modeling in data warehouses, addressing operationalization challenges.
undefined
10 snips
May 13, 2025 • 46min

Balancing Off-the-Shelf and Custom Solutions in Data Engineering

Tulika Bhatt, a senior software engineer at Netflix specializing in impression data, shares her journey from BlackRock and Verizon to shaping data services at a top streaming service. She discusses the challenges of balancing off-the-shelf solutions with custom systems, utilizing technologies like Spark and Flink. Tulika dives into the intricacies of ensuring data quality and observability, emphasizing automation and robust alerting strategies. She also explores the integration of AI in data engineering, highlighting its potential and the hurdles faced in maximizing efficiency.
undefined
9 snips
May 5, 2025 • 60min

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

Sida Shen, a product manager at CelerData and a contributor to StarRocks, dives into the innovative world of high-performance analytical databases. He shares the origins of StarRocks, illustrating its evolution from Apache Doris into a robust Lakehouse query engine. Topics include handling high concurrency and low latency queries, bridging traditional OLAP with lakehouse architecture, and the importance of integration with formats like Apache Iceberg. Sida also emphasizes the challenges of denormalization and real-time data processing in modern analytics.
undefined
12 snips
Apr 28, 2025 • 1h 13min

Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications

Derek Collison, the creator of NATS and CEO of Synadia, shares insights from his impressive background at Google and VMware. He discusses how NATS revolutionizes messaging systems with innovative features like the circuit breaker pattern and Jetstream. Derek highlights NATS’s advantages in edge computing, emphasizing its resilience and data persistence capabilities. He also addresses the challenges of open-source technology and shares thoughts on the future of connectivity in modern distributed systems, proving NATS's versatility across various industries.
undefined
37 snips
Apr 21, 2025 • 57min

Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog

Victor Kessler, co-founder of Vakama and developer of Lakekeeper, dives into the world of advanced lakehouse management with a focus on Apache Iceberg. He discusses the pivotal role of metadata in data actionability and the evolution of data catalogs. Victor highlights innovative features of Lakekeeper, like integration with OpenFGA for access control and its deployment using Rust on Kubernetes. He also addresses the challenges of migrating data catalogs and the importance of community involvement in open-source projects for better data management.
undefined
65 snips
Apr 12, 2025 • 40min

Simplifying Data Pipelines with Durable Execution

In this engaging conversation, Jeremy Edberg, CEO of DBOS and former tech lead at companies like Reddit and Netflix, discusses the vital concept of durable execution in data systems. He reveals how DBOS's serverless platform enhances local resilience and simplifies the intricacies of data pipelines. Jeremy emphasizes the significance of version management for long-running workflows and introduces the Transact library, which boosts reliability and efficiency. This episode is a treasure trove for anyone interested in optimizing data workflows and reducing operational headaches.
undefined
22 snips
Mar 30, 2025 • 44min

Overcoming Redis Limitations: The Dragonfly DB Approach

Roman Gershman, CTO and founder of Dragonfly DB, shares his journey from Google to creating a high-speed alternative to Redis. He dives into the challenges of developing in-memory databases, focusing on performance, scalability, and cost efficiency. Roman discusses operational complexities users face, while highlighting Dragonfly's compatibility with Redis and innovations like SSD tiering. He also explores programming trade-offs between C++ and Rust, emphasizing adaptability in database development and the importance of community feedback in shaping future advancements.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app