The Data Engineering Show cover image

The Data Engineering Show

Latest episodes

undefined
May 7, 2025 • 32min

How Rising Wave Is Redefining Real-Time Data with Postgres Power

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the evolution of stream processing systems and the innovations his company is bringing to the space.What you’ll learn:Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system.How Rising Wave's architecture, using S3 as primary storage, delivers second-level scalability, while other systems can take hours to scale.The competitive landscape of stream processing, with Rising Wave's Postgres compatibility providing a significant advantage in ease of use.How one major company reduced its CPU requirements from 20,000 to just 600 by switching from a traditional stream processing system to Rising Wave.The rising importance of Apache Iceberg as a destination for stream processing output, helping companies avoid vendor lock-in.How streaming systems fit into modern data stacks, especially as companies seek to avoid being locked into proprietary systems.Yingjun Wu is the founder and CEO of Rising Wave, a stream processing system built in Rust and designed with a cloud-native architecture. With a PhD focused on stream processing and database systems, Yingjun previously worked at Redshift and IBM Research before founding Rising Wave. His company has developed a system that achieves significant performance and resource efficiency advantages over traditional stream processing solutions, while maintaining Postgres compatibility for ease of use.Episode Highlights:The Origins of Rising Wave (00:30)Yingjun shares his background in stream processing from his PhD days and explains how his experience at Redshift revealed the need for better stream processing solutions, especially since many data warehouse workloads involve data ingested from streaming sources like Kinesis or Kafka.Building a System from Scratch (04:10)Yingjun describes the challenging first 2-3 years of developing Rising Wave without customers, highlighting how trust is a major barrier for new database systems. After 2.5 years, they secured their first customers, including a startup and several larger companies, which helped establish Rising Wave's credibility.The Current Stream Processing Landscape (07:47)Benjamin asks about the current stream processing space, with Yingjun positioning Rising Wave as a leader, particularly for SQL-based workloads. He highlights several key advantages of Rising Wave, including its Rust-based implementation and S3-based storage architecture.S3 as Primary Storage (10:27)Yingjun explains their decision to use S3 as primary storage from day one, despite its slowness and expense. He discusses how they've optimized for these challenges and would still make the same architectural choice today due to benefits like simplified state management and superior elastic scaling.The Business Model (13:52)Rising Wave offers open-source, cloud, and on-premise versions of its product. Yingjun notes that many highly regulated industries require on-premise deployment, including customers in the banking and aerospace sectors.Typical Users and Competitive Advantages (15:01)When asked about their typical users, Yingjun explains they directly compete with Flink but have advantages in ease of use due to Postgres compatibility. Their users are either new to stream processing or are migrating from systems like Spark Streaming or Flink due to performance issues or development complexity.Apache Iceberg Integration (19:25)Yingjun discusses how Apache Iceberg is emerging as an important destination for Rising Wave output, as companies seek to avoid vendor lock-in with proprietary data warehouses. He explains how Rising Wave typically performs ETL functions before data is sent to Iceberg tables.The Future of Data Management (32:06)The conversation concludes with a discussion about Iceberg becoming a "single source of truth" for data, with multiple specialized query engines potentially accessing the same data. Yingjun and Eldad share perspectives on how this shift away from proprietary data lock-in is changing the data ecosystem.Episode Resources:Rising Wave WebsiteYingjun Wu LinkedInThe Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
6 snips
Apr 8, 2025 • 24min

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

Lisa Cao, Product Manager at DataStrato, dives into the world of data governance, sharing her expertise in AI/ML and open-source frameworks. The discussion highlights Apache Gravitino's unique capabilities, enabling unified governance across diverse data systems. They tackle the 'Push-Down Permission Management' model, essential for security, and the growing trend towards open ecosystems that prioritize flexibility. Lisa also emphasizes the importance of real-world tool adoption versus social media hype, keeping data engineers agile in a fast-paced landscape.
undefined
4 snips
Mar 19, 2025 • 31min

Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen

Hannes Mühleisen, CEO of DuckDB Labs and a professor in the Netherlands, discusses the innovative journey of DuckDB, an open-source analytical database that’s making waves with 10 million monthly downloads. He highlights how DuckDB differs from SQLite and its powerful analytical capabilities. Hannes also dives into the system's flexible ecosystem, allowing for custom functionalities. A fascinating discussion on AI’s impact on database management showcases the balance between traditional SQL usage and modern technological advancements.
undefined
4 snips
Feb 11, 2025 • 31min

AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma

Daniel Pálma, Head of Marketing at Estuary, shares his journey from data engineer to marketing professional. He discusses how his tech background enriches his marketing strategies. The conversation dives into the impact of AI on data movement and how cloud solutions are transforming data integration. Daniel highlights the rise of vector databases and structured data challenges in AI, as well as the promising future of Apache Iceberg in data lakehouses. Finally, he emphasizes the growing need for data practitioners in this golden age of data.
undefined
Jan 7, 2025 • 37min

AI and Data Change Management with Chad Sanderson, CEO Gable AI

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI  to explore the interesting world of data change management.Join them as they:Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain.Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and communicate the impact of changes before they reach production.Explain why the gap between data producers and consumers need to be bridged and how Gable continues to emphasize the need for effective communication and understanding data change management across teamsShine light on how AI can enhance data management by extracting semantics from code and effectively manage the translation output.Discuss Chad’s vision for 2025 which is to help companies start to care about data and how the changes made to data affect other people.Chad Sanderson is the CEO and co-founder of Gable AI, a data change management platform. Chad has over a decade of experience in data engineering and infrastructure space, holding significant roles at major companies like Microsoft, Oracle, Sephora where he focused on data quality and governance challenges. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp. His journey from data scientist to data engineer and ultimately to CEO was driven by a desire to transform how organizations manage and utilize data. Gable AI addresses the complexities of the data supply chain, by providing tools for code scanning, data contracts and governance as code, enabling teams to proactively manage data changes and impact.If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube. Episode ResourcesGable AI websiteChad Sanderson on LinkedInThe Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
Nov 26, 2024 • 25min

Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success

Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.They discuss:From Excel to Expert: From basic Excel tasks to a full mastery of BI tools like QlikView, Wouter has blended his technical and philosophical approaches to data to become a bona fide expert.Data Strategy as Transformation: Good change management principles have to be adhered to if a BI project is going to bear fruit. Focus on leadership alignment, KPI clarity, and user empowerment instead of simply implementing software. Challenges of Starting Small: Wouter has some tips to offer smaller companies around bootstrapping their data journey using existing tools, practical education, and even Gen AI.Balancing Scales: Smaller startups compared to large enterprises face a very different set of challenges.Wouter’s combination of philosophy and pragmatism brings fresh takes to building effective data solutions.The Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
Oct 31, 2024 • 28min

Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan

In this special roundup episode of The Data Engineering Show, the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.Topics include:Foundations of Data Engineering: Zach Wilson emphasizes the importance of core, tech-agnostic skills in data modeling, quality assurance, and storytelling. By sharing his experiences at Airbnb and in education, he reveals that effective data engineering hinges on creating robust data models, quality controls, and persuasive narratives rather than expertise in any single tool or language.Bridging Academia and Practice: Matthew Housley and Joe Reis delve into the need for better data education, emphasizing hands-on experience and data fundamentals over tool-specific training, and advocate for apprenticeships and real-world collaborations in educational settings.Legacy Meets Modern in Data Engineering: Krishnan Viswanathan reflects on recurring themes in data engineering and the importance of adapting legacy approaches to new data needs, underscoring the challenges and benefits of vendor-built versus in-house solutions.Join the Bros for a well-rounded exploration of current themes in data engineering, filled with practical advice for data professionals at any stage of their journey.The Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
Sep 24, 2024 • 33min

The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn

In this episode of The Data Engineering Show, the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows.Together they cover:Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting up Kafka topics, Flink jobs, and schemas.Integration with Kubernetes: The project utilizes Kubernetes to handle infrastructure tasks, treating Kubernetes as a database for managing state. This integration simplifies the orchestration of data workflows and automates routine tasks.Consumer-Driven Model: Ryanne discusses the shift from a producer-driven to a consumer-driven data model, emphasizing the importance of understanding and addressing consumer needs to reduce engineering complexity and optimize data systems.Future of Data Engineering: The conversation touches on the ongoing experimental nature of Hoptimator and its potential to transform data engineering practices, highlighting its impact on LinkedIn's data infrastructure.The Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
Jun 4, 2024 • 43min

Vector Databases Won’t Replace SQL - Andy Pavlo

SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.The Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing
undefined
Apr 16, 2024 • 40min

How ZoomInfo transitioned from data graveyards to ROI-driven data projects

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. The Data Engineering Show is handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app