

Data Engineering Weekly
Ananth Packkildurai
The Weekly Data Engineering Newsletter www.dataengineeringweekly.com
Episodes
Mentioned books

Aug 13, 2025 • 45min
Insights from Jacopo Tagliabue, CTO of Bauplan: Revolutionizing Data Pipelines with Functional Data Engineering
Data Engineering Weekly recently hosted Jacopo Tagliabue, CTO of Bauplan, for an insightful podcast exploring innovative solutions in data engineering. Jacopo shared valuable perspectives drawn from his entrepreneurial journey, his experience building multiple companies, and his deep understanding of data engineering challenges. This extensive conversation spanned the complexities of data engineering and showcased Bauplan’s unique approach to tackling industry pain points.Entrepreneurial Journey and Problem IdentificationJacopo opened the discussion by highlighting the personal and professional experiences that led him to create Bauplan. Previously, he built a company specializing in Natural Language Processing (NLP) at a time when NLP was still maturing as a technology. After selling this initial venture, Jacopo immersed himself deeply in data engineering, navigating through complex infrastructures involving Apache Spark, Airflow, and Snowflake.He recounted the profound frustration of managing complicated and monolithic data stacks that, despite their capabilities, came with significant operational overhead. Transitioning from Apache Spark to Snowflake provided some relief, yet it introduced new limitations, particularly concerning Python integration and vendor lock-in. Recognizing the industry-wide need for simplicity, Bauplan was conceptualized to offer engineers a straightforward and efficient alternative.Bauplan’s Core Abstraction - Functions vs. Traditional ETLAt Bauplan’s core is the decision to use functions as the foundational building block. Jacopo explained how traditional ETL methodologies typically demand extensive management of infrastructure and impose high cognitive overhead on engineers. By contrast, functions offer a much simpler, modular approach. They enable data engineers to focus purely on business logic without worrying about complex orchestration or infrastructure.The Bauplan approach distinctly separates responsibilities: engineers handle code and business logic, while Bauplan’s platform takes charge of data management, caching, versioning, and infrastructure provisioning. Jacopo emphasized that this separation significantly enhances productivity and allows engineers to operate efficiently, creating modular and easily maintainable pipelines.Data Versioning, Immutability, and ReproducibilityJacopo firmly underscored the importance of immutability and reproducibility in data pipelines. He explained that data engineering historically struggles with precisely reproducing pipeline states, especially critical during debugging and auditing. Bauplan directly addresses these challenges by automatically creating immutable snapshots of both the code and data state for every job executed. Each pipeline execution receives a unique job ID, guaranteeing precise reproducibility of the pipeline’s state at any given moment.This method enables engineers to easily debug issues without impacting the production environment easily easily, thereby streamlining maintenance tasks and enhancing reliability. Jacopo highlighted this capability as central to achieving robust data governance and operational clarity.Branching, Collaboration, and Conflict ResolutionBauplan integrates Git-like branching capabilities tailored explicitly for data engineering workflows. Jacopo detailed how this capability allows engineers to experiment, collaborate, and innovate safely in isolated environments. Branching provides an environment where engineers can iterate without fear of disrupting ongoing operations or production pipelines.Jacopo explained that Bauplan handles conflicts conservatively. If two engineers attempt to modify the same table or data concurrently, Bauplan requires explicit rebasing. While this strict conflict-resolution policy may appear cautious, it maintains data integrity and prevents unexpected race conditions. Bauplan ensures that each branch is appropriately isolated, promoting clean, structured collaboration.Apache Arrow and Efficient Data ShufflingEfficiency in data shuffling, especially between pipeline functions, was another critical topic. Jacopo praised Apache Arrow’s role as the backbone of Bauplan’s data interchange strategy. Apache Arrow's zero-copy transfer capability significantly boosts data movement speed, removing traditional bottlenecks associated with serialization and data transfers.Jacopo illustrated how Bauplan leverages Apache Arrow to facilitate rapid data exchanges between functions, dramatically outperforming traditional systems like Airflow. By eliminating the need for intermediate serialization, Bauplan achieves significant performance improvements and streamlines data processing, enabling rapid, efficient pipelines.Vertical Scaling and System PerformanceFinally, the conversation shifted to vertical scaling strategies employed by Bauplan. Unlike horizontally distributed systems like Apache Spark, Bauplan strategically focuses on vertical scaling, which simplifies infrastructure management and optimizes resource utilization. Jacopo explained that modern cloud infrastructures now offer large compute instances capable of handling substantial data volumes efficiently, negating the complexity typically associated with horizontally distributed systems.He clarified Bauplan’s current operational range as optimally suited for pipeline data volumes typically between 10GB and 100 100GB. This range covers a vast majority of standard enterprise use cases, making Bauplan a highly suitable and effective solution for most organizations. Jacopo stressed that although certain specialized scenarios still require distributed computing platforms, the majority of pipelines benefit immensely from Bauplan’s simplified, vertically scaled approach.In summary, Jacopo Tagliabue offered a compelling vision of Bauplan’s mission to simplify and enhance data engineering through functional abstractions, immutable data versioning, and efficient, vertically scaled operations. Bauplan presents an innovative solution designed explicitly around the real-world challenges data engineers face, promising significant improvements in reliability, performance, and productivity in managing modern data pipelines. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Apr 25, 2025 • 37min
AI and Data in Production: Insights from Avinash Narasimha [AI Solutions Leader at Koch Industries]
In our latest episode of Data Engineering Weekly, co-hosted by Aswin, we explored the practical realities of AI deployment and data readiness with our distinguished guest, Avinash Narasimha, AI Solutions Leader at Koch Industries. This discussion shed significant light on the maturity, challenges, and potential that generative AI and data preparedness present in contemporary enterprises.Introducing Our Guest: Avinash NarasimhaAvinash Narasimha is a seasoned professional with over two decades of experience in data analytics, machine learning, and artificial intelligence. His focus at Koch Industries involves deploying and scaling various AI solutions, with particular emphasis on operational AI and generative AI. His insights stem from firsthand experience in developing robust AI frameworks that are actively deployed in real-world applications.Generative AI in Production: Reality vs. HypeOne key question often encountered in the industry revolves around the maturity of generative AI in actual business scenarios. Addressing this concern directly, Avinash confirmed that generative AI has indeed crossed the pilot threshold and is actively deployed in several production scenarios at Koch Industries. Highlighting their early adoption strategy, Avinash explained that they have been on this journey for over two years, emphasizing an established continuous feedback loop as a critical component in maintaining effective generative AI operations.Production Readiness and DeploymentDeployment strategies for AI, particularly for generative models and agents, have undergone significant evolution. Avinash described the systematic approach based on his experience: * Beginning with rigorous experimentation* Transitioning smoothly into scalable production environments* Incorporating robust monitoring and feedback mechanisms. The result is a successful deployment of multiple generative AI solutions, each carefully managed and continuously improved through iterative processes.The Centrality of Data ReadinessDuring our conversation, we explored the significance of data readiness, a pivotal factor that influences the success of AI deployment. Avinash emphasized data readiness as a fundamental component that significantly impacts the timeline and effectiveness of integrating AI into production systems.He emphasized the following:- Data Quality: Consistent and high-quality data is crucial. Poor data quality frequently acts as a bottleneck, restricting the performance and reliability of AI models.- Data Infrastructure: A Robust data infrastructure is necessary to support the volume, velocity, and variety of data required by sophisticated AI models.- Integration and Accessibility: The ease of integrating and accessing data within the organization significantly accelerates AI adoption and effectiveness.Challenges in Data ReadinessAvinash openly discussed challenges that many enterprises face concerning data readiness, including fragmented data ecosystems, legacy systems, and inadequate data governance. He acknowledged that while the journey toward optimal data readiness can be arduous, organizations that systematically address these challenges see substantial improvements in their AI outcomes.Strategies for Overcoming Data ChallengesAvinash also offered actionable insights into overcoming common data-related obstacles:- Building Strong Data Governance: A robust governance framework ensures that data remains accurate, secure, and available when needed, directly enhancing AI effectiveness.- Leveraging Cloud Capabilities: He noted recent developments in cloud-based infrastructure as significant enablers, providing scalable and sophisticated tools for data management and model deployment.- Iterative Improvement: Regular feedback loops and iterative refinement of data processes help gradually enhance data readiness and AI performance.Future Outlook: Trends and ExpectationsLooking ahead, Avinash predicted increased adoption of advanced generative AI tools and emphasized ongoing improvements in model interpretability and accountability. He expects enterprises will increasingly prioritize explainable AI, balancing performance with transparency to maintain trust among stakeholders.Moreover, Avinash highlighted the anticipated evolution of data infrastructure to become more flexible and adaptive, catering specifically to the unique demands of generative AI applications. He believes this evolution will significantly streamline the adoption of AI across industries.Key Takeaways- Generative AI is Ready for Production: Organizations, particularly those that have been proactive in their adoption, have successfully integrated generative AI into production, highlighting its maturity beyond experimental stages.- Data Readiness is Crucial: Effective AI deployment is heavily dependent on the quality, accessibility, and governance of data within organizations.- Continuous Improvement: Iterative feedback and continuous improvements in data readiness and AI deployment strategies significantly enhance performance and outcomes.Closing ThoughtsOur discussion with Avinash Narasimha provided practical insights into the real-world implementation of generative AI and the critical role of data readiness. His experience at Koch Industries illustrates not only the feasibility but also the immense potential generative AI holds for enterprises willing to address data challenges and deploy AI thoughtfully and systematically.Stay tuned for more insightful discussions on Data Engineering Weekly.All rights reserved, ProtoGrowth Inc., India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

12 snips
Mar 6, 2025 • 42min
Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses
The discussion examines Apache Iceberg's potential as a modern alternative to Hadoop. It tackles the small file problem in data lakes and how Iceberg manages it, plus the operational challenges organizations face during implementation. Key comparisons are drawn with other data formats like Hudi and Delta Lake, underlining the importance of vendor support. The conversation also highlights the complexities of adopting Iceberg versus traditional solutions, emphasizing the need for user-friendly tools and proof-of-concept projects.

Feb 26, 2025 • 1h 3min
The State of Lakehouse Architecture: A Conversation with Roy Hassan on Maturity, Challenges, and Future Trends
Lakehouse architecture represents a major evolution in data engineering. It combines data lakes' flexibility with data warehouses' structured reliability, providing a unified platform for diverse data workloads ranging from traditional business intelligence to advanced analytics and machine learning. Roy Hassan, a product leader at Upsolver, now Qlik, offers a comprehensive reality check on Lakehouse implementations, shedding light on their maturity, challenges, and future directions.Defining Lakehouse ArchitectureA Lakehouse is not a specific product, tool, or service but an architectural framework. This distinction is critical because it allows organizations to tailor implementations to their needs and technological environments. For instance, Databricks users inherently adopt a Lakehouse approach by storing data in object storage, managing it with the Delta Lake format, and analyzing it directly on the data lake.Assessing the Maturity of Lakehouse ImplementationsThe adoption and maturity of Lakehouse implementations vary across cloud platforms and ecosystems:Databricks: Many organizations have built mature Lakehouse implementations using Databricks, leveraging its robust capabilities to handle diverse workloads.Amazon Web Services (AWS): While AWS provides services like Athena, Glue, Redshift, and EMR to access and process data in object storage, many users still rely on traditional data lakes built on Parquet files. However, a growing number are adopting Lakehouse architectures with open table formats such as Iceberg, which has gained traction within the AWS ecosystem.Azure Fabric: Built on the Delta Lake format, Azure Fabric offers a vertically integrated Lakehouse experience, seamlessly combining storage, cataloging, and computing resources.Snowflake: Organizations increasingly use Snowflake in a Lakehouse-oriented manner, storing data in S3 and managing it with Iceberg. While new workloads favor Iceberg, most existing data remains within Snowflake’s internal storage.Google BigQuery: The Lakehouse ecosystem in Google Cloud is still evolving. Many users prefer to keep their workloads within BigQuery due to its simplicity and integrated storage.Despite these differences in maturity, the industry-wide adoption of Lakehouse architectures continues to expand, and their implementation is becoming increasingly sophisticated.Navigating Open Table Formats: Iceberg, Delta Lake, and HudiDiscussions about open table formats often spark debate, but each format offers unique strengths and is backed by a dedicated engineering community:Iceberg and Delta Lake share many similarities, with ongoing discussions about potential standardization.Hudi specializes in streaming use cases and optimizing real-time data ingestion and processing. [Listen to The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi]Most modern query engines support Delta Lake and Iceberg, reinforcing their prominence in the Lakehouse ecosystem. While Hudi and Paimon have smaller adoption, broader query engine support for all major formats is expected over time.Examining Apache XTable’s RoleApache XTable aims to improve interoperability between different table formats. While the concept is practical, its long-term relevance remains uncertain. As the industry consolidates around fewer preferred formats, converting between them may introduce unnecessary complexity, latency, and potential points of failure—especially at scale.Challenges and Criticisms of Lakehouse ArchitectureOne common criticism of Lakehouse architecture is its lower abstraction level than traditional databases. Developers often need to understand the underlying file system, whereas databases provide a more seamless experience by abstracting storage management. The challenge is to balance Lakehouse's flexibility and traditional databases' ease of use.Best Practices for Lakehouse AdoptionA successful Lakehouse implementation starts with a well-defined strategy that aligns with business objectives. Organizations should:• Establish a clear vision and end goals.• Design a scalable and efficient architecture from the outset.• Select the right open table format based on workload requirements.The Significance of Shared StorageShared storage is a foundational principle of Lakehouse architecture. Organizations can analyze data using multiple tools and platforms by storing it in a single location and transforming it once. This approach reduces costs, simplifies data management, and enhances agility by allowing teams to choose the most suitable tool for each task.Catalogs: Essential Components of a LakehouseCatalogs are crucial in Lakehouse implementations as metadata repositories describing data assets. These catalogs fall into two categories:Technical catalogs, which focus on data management and organization.Business catalogs, which provide a business-friendly view of the data landscape.A growing trend in the industry is the convergence of technical and business catalogs to offer a unified view of data across the organization. Innovations like the Iceberg REST catalog specification have advanced catalog management by enabling a decoupled and standardized approach.The Future of Catalogs: AI and Machine Learning IntegrationIn the coming years, AI and machine learning will drive the evolution of data catalogs. Automated data discovery, governance, and optimization will become more prevalent, allowing organizations to unlock new AI-powered insights and streamline data management processes.The Changing Role of Data Engineers in the AI EraThe rise of AI is transforming the role of data engineers. Traditional responsibilities like building data pipelines are shifting towards platform engineering and enabling AI-driven data capabilities. Moving forward, data engineers will focus on:• Designing and maintaining AI-ready data infrastructure.• Developing tools that empower software engineers to leverage data more effectively.Final ThoughtsLakehouse architecture is rapidly evolving, with growing adoption across cloud ecosystems and advancements in open table formats, cataloging, and AI integration. While challenges remain—particularly around abstraction and complexity—the benefits of flexibility, cost efficiency, and scalability make it a compelling approach for modern data workloads.Organizations investing in a Lakehouse strategy should prioritize best practices, stay informed about emerging trends, and build architectures that support current and future data needs.All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Feb 19, 2025 • 37min
Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics
Jack Wu, a prominent figure at Alibaba Cloud and a leading developer of Flink SQL, dives into Fluss, a novel streaming storage solution designed for real-time analytics. He discusses how Fluss overcomes the limitations of Kafka, focusing on its Lakehouse-native architecture for better schema management. The episode highlights the architectural distinctions in Fluss that enhance data processing and state management. Wu also explores Fluss's role in data replication and its use cases in major enterprises, showcasing its advantages for scalability and efficiency.

Jan 9, 2025 • 48min
The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi
Vinoth Chandar, Founder and CEO of Onehouse and PMC Chair of Apache Hudi, discusses the evolution of lakehouse technology. He shares insights on Apache Hudi's impact on data engineering and explores challenges in building high-scale data ecosystems. The conversation highlights innovations in Hudi 1.0, including enhanced concurrency and update features. Additionally, they delve into the role of open source in the data landscape, emphasizing the importance of standardization and collaboration among emerging data formats.

Dec 29, 2024 • 1h 11min
Agents of Change: Navigating 2025 with AI and Data Innovation
Agents of Change: Navigating 2025 with AI and Data InnovationIn this episode of Dew, the hosts and guests discuss their predictions for 2025, focusing on the rise and impact of agentic AI. The conversation covers three main categories:1. The role of agent AI2. The future workforce dynamic involving human and AI agent3. Innovations in data platforms heading into 2025.Highlights include insights from Ashwin and our special guest, Rajesh, on building robust agent systems, strategies for data engineers and AI engineers to remain relevant, data quality and observability, and the evolving landscape of Lakehouse architectures.The discussion also discusses the challenges of integrating multi-agent systems and the economic implications of AI sovereignty and data privacy.00:00 Introduction and Predictions for 202501:49 Exploring Agentic AI04:44 The Evolution of AI Models16:36 Enterprise Data and AI Integration25:06 Managing AI Agents36:37 Opportunities in AI and Agent Development38:02 The Evolving Role of AI and Data Engineers38:31 Managing AI Agents and Data Pipelines39:05 The Future of Data Scientists in AI40:03 Multi-Agent Systems and Interoperability44:09 Economic Viability of Multi-Agent Systems47:06 Data Platforms and Lakehouse Implementations53:14 Data Quality, Observability, and Governance01:02:20 The Rise of Multi-Cloud and Multi-Engine Systems01:06:21 Final Thoughts and Future Outlook This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Dec 25, 2023 • 38min
Data Engineering Trends With Aswin & Ananth
Welcome to another insightful edition of Data Engineering Weekly. As we approach the end of 2023, it's an opportune time to reflect on the key trends and developments that have shaped the field of data engineering this year. In this article, we'll summarize the crucial points from a recent podcast featuring Ananth and Ashwin, two prominent voices in the data engineering community.Understanding the Maturity Model in Data EngineeringA significant part of our discussion revolved around the maturity model in data engineering. Organizations must recognize their current position in the data maturity spectrum to make informed decisions about adopting new technologies. This approach ensures that adopting new tools and practices aligns with the organization's readiness and specific needs.The Rising Impact of AI and Large Language Models2023 witnessed a substantial impact of AI and large language models in data engineering. These technologies are increasingly automating processes like ETL, improving data quality management, and evolving the landscape of data tools. Integrating AI into data workflows is not just a trend but a paradigm shift, making data processes more efficient and intelligent.Lake House Architectures: The New FrontierLakehouse architectures have been at the forefront of data engineering discussions this year. The key focus has been interoperability among different data lake formats and the seamless integration of structured and unstructured data. This evolution marks a significant step towards more flexible and powerful data management systems.The Modern Data Stack: A Critical EvaluationThe modern data stack (MDS) has been a hot topic, with debates around its sustainability and effectiveness. While MDS has driven hyper-specialization in product categories, challenges in integration and overlapping tool categories have raised questions about its long-term viability. The future of MDS remains a subject of keen interest as we move into 2024.Embracing Cost OptimizationCost optimization has emerged as a priority in data engineering projects. With the shift to cloud services, managing costs effectively while maintaining performance has become a critical concern. This trend underscores the need for efficient architectures that balance performance with cost-effectiveness.Streaming Architectures and the Rise of Apache FlinkStreaming architectures have gained significant traction, with Apache Flink leading the way. Its growing adoption highlights the industry's shift towards real-time data processing and analytics. The support and innovation around Apache Flink suggest a continued focus on streaming architectures in the coming year.Looking Ahead to 2024As we look towards 2024, there's a sense of excitement about the potential changes in fundamental layers like S3 Express and the broader impact of large language models. The anticipation is for more intelligent data platforms that effectively combine AI capabilities with human expertise, driving innovation and efficiency in data engineering.In conclusion, 2023 has been a year of significant developments and shifts in data engineering. As we move into 2024, we will likely focus on refining these trends and exploring new frontiers in AI, lake house architectures, and streaming technologies. Stay tuned for more updates and insights in the next editions of Data Engineering Weekly. Happy holidays, and here's to a groundbreaking 2024 in data engineering! This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Jul 5, 2023 • 23min
DEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality
Welcome to another episode of Data Engineering Weekly. Aswin and I select 3 to 4 articles from each edition of Data Engineering Weekly and discuss them from the author’s and our perspectives.On DEW #133, we selected the following articleLakeFs: How to Implement Write-Audit-Publish (WAP)I wrote extensively about the WAP pattern in my latest article, An Engineering Guide to Data Quality - A Data Contract Perspective. Super excited to see a complete guide on implementing the WAP pattern in Iceberg, Hudi, and of course, with LakeFs.https://lakefs.io/blog/how-to-implement-write-audit-publish/Jatin Solanki: Vector Database - Concepts and examplesStaying with the vector search, a new class of Vector Databases is emerging in the market to improve the semantic search experiences. The author writes an excellent introduction to vector databases and their applications.https://blog.devgenius.io/vector-database-concepts-and-examples-f73d7e683d3ePolicy Genius: Data Warehouse Testing Strategies for Better Data QualityData Testing and Data Observability are widely discussed topics in Data Engineering Weekly. However, both techniques test once the transformation task is completed. Can we test SQL business logic during the development phase itself? Perhaps unit test the pipeline?The author writes an exciting article about adopting unit testing in the data pipeline by producing sample tables during the development. We will see more tools around the unit test framework for the data pipeline soon. I don’t think testing data quality on all the PRs against the production database is not a cost-effective solution. We can do better than that, tbh.https://medium.com/policygenius-stories/data-warehouse-testing-strategies-for-better-data-quality-d5514f6a0dc9 This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Jul 5, 2023 • 35min
DEW #132: The New Generative AI Infra Stack, Databricks cost management at Coinbase, Exploring an Entity Resolution Framework Across Various Use Cases & What's the hype behind DuckDB?
Welcome to another episode of Data Engineering Weekly. Aswin and I select 3 to 4 articles from each edition of Data Engineering Weekly and discuss them from the author’s and our perspectives.On DEW #132, we selected the following articleCowboy Ventures: The New Generative AI Infra StackGenerative AI has taken the tech industry by storm. In Q1 2023, a whopping $1.7B was invested into gen AI startups. Cowboy ventures unbundle the various categories of Generative AI infra stack here.https://medium.com/cowboy-ventures/the-new-infra-stack-for-generative-ai-9db8f294dc3fCoinbase: Databricks cost management at CoinbaseEffective cost management in data engineering is crucial as it maximizes the value gained from data insights while minimizing expenses. It ensures sustainable and scalable data operations, fostering a balanced business growth path in the data-driven era. Coinbase writes one case about cost management for Databricks and how they use the open-source overwatch tool to manage Databrick’s cost.https://www.coinbase.com/blog/databricks-cost-management-at-coinbaseWalmart: Exploring an Entity Resolution Framework Across Various Use CasesEntity resolution, a crucial process that identifies and links records representing the same entity across various data sources, is indispensable for generating powerful insights about relationships and identities. This process, often leveraging fuzzy matching techniques, not only enhances data quality but also facilitates nuanced decision-making by effectively managing relationships and tracking potential matches among data records. Walmart writes about the pros and cons of approaching fuzzy matching with rule-based and ML-based matching.https://medium.com/walmartglobaltech/exploring-an-entity-resolution-framework-across-various-use-cases-cb172632e4aeMatt Palmer: What's the hype behind DuckDB?So DuckDB, Is it hype? or does it have the real potential to bring architectural changes to the data warehouse? The author explains how DuckDB works and the potential impact of DuckDB in Data Engineering.https://mattpalmer.io/posts/whats-the-hype-duckdb/ This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com