

The Data Stack Show
Rudderstack
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Episodes
Mentioned books

Nov 29, 2023 • 1h 12min
166: Data Processing Fundamentals and Building a Unified Execution Engine Featuring Pedro Pedreira of Meta
Highlights from this week’s conversation include:The concept of composable at a lower level of data infrastructure (1:28)New architectures and components that allow developers to build databases (3:44)Pedro's background and experience in data infrastructure (6:18)The Spectrum of Latency and Analytics (12:59)Different Query Engines for Different Use Cases (16:32)Vectorized vs Code Gen Data Processing (19:33)Vectorization and Code Generation (21:21)Examples of Vectorized Engines (24:33)Rewriting Execution Engine in C++ (27:22)Different Organization of Presto and Spark (33:17)Arrow and its Extensions (37:15)The similarities between analytics and ML (44:33)Offline feature engineering and data preprocessing for training (48:00)Dialect and semantic differences in using Velox for different engines (50:01)The convergence of dialects (52:23)Challenges of substrate and semantics (53:18)Future plans for Velox (58:09)The discussion on evolving Parquet (1:03:38)The integration of the relational model and the tensor model (1:07:29)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Nov 27, 2023 • 6min
The PRQL: How Does Composability in Data Infrastructure Differ at Different Levels of Abstraction? Featuring Pedro Pedreira of Meta
In this bonus episode, Eric and Kostas preview their upcoming conversation with Pedro Pedreira of Meta.

Nov 22, 2023 • 54min
165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni
Highlights from this week’s conversation include:Colin's Background and Starting Omni (1:48)Defining “good” at Google search early in his career (4:42)Looker's Unique Approach to Analytics (9:48)The paradigm shift in analytics (10:52)The architecture of Looker and its influence (12:04)Combatting the challenge of unbundling in the data stack (14:26)The evolution of analytics engineering (21:50)Enhancing user flexibility in Omni (23:44)The evolution of BI tools (32:53)What does the future look like for BI tools? (35:14)The role of Python and notebooks in BI (39:48)The product experience of Omni and its vision (45:27)Expectations for the future of Omni (47:52)The relationship between algorithms and business logic (50:51)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Nov 20, 2023 • 2min
The PRQL: Building a Data Product for Data People: Looker's Vision and Omni's Future with Colin Zima
In this bonus episode, Eric and Kostas preview their upcoming conversation with Colin Zima of Omni.

6 snips
Nov 15, 2023 • 57min
164: How The GTM and Data Teams at Snowflake Work Together with Travis Henry and Hillary Carpio
Hillary Carpio, who leads account-based marketing at Snowflake, and Travis Henry, focused on sales operations, share their insights on the dynamic partnership between marketing and data teams. They discuss the significance of account-based marketing versus traditional strategies, emphasizing personalized outreach and the role of Sales Development Representatives. Hillary and Travis also touch on data overload, the importance of clear communication, and their unexpected journey towards writing a book together, offering lessons in collaboration and adaptability in the tech world.

Nov 13, 2023 • 5min
The PRQL: Navigating the World of Data Overload with Travis Henry and Hillary Carpio of Snowflake
In this bonus episode, Eric and Kostas preview their upcoming conversation with Travis Henry and Hillary Carpio of Snowflake.

4 snips
Nov 8, 2023 • 1h 4min
163: Simplifying Real-Time Streaming with David Yaffe and Johnny Graettinger of Estuary
Highlights from this week’s conversation include:Johnny and David’s background in working together (1:56)The background story of Estuary (4:15)The challenges of ad tech and the need for low latency (5:44)Use cases for moving data at scale (10:35)Real-time data replication methods (11:54)Challenges with Kafka and the birth of Gazette (13:54)Comparing Kafka and Gazette (20:22)The importance of existing streaming tools (22:28)Challenges of managing Kafka and the need for a different approach (23:40)The role of compaction in streaming applications (26:54)The challenge of relaxing state management (34:01)Replication and the problem of data synchronization (36:48)Incremental Back Fills and Risk-Free Production Database (46:03)Estuary as a Platform and Connectors (47:45)The challenges of real-time streaming (57:56)Orchestration in real-time streaming (1:00:51)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Nov 6, 2023 • 4min
The PRQL: The Shortcomings of Apache Kafka with David Yaffe and Johnny Graettinger of Estuary
In this bonus episode, Eric and Kostas preview their upcoming conversation with David Yaffe and Johnny Graettinger of Estuary.

Nov 1, 2023 • 57min
162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient
Highlights from this week’s conversation include:The potential of AI-driven applications (1:34)The need for hardware infrastructure in AI experimentation (2:40)Oligopoly on the closed side (11:50)Advantages of private side vs. open source (13:18)Leveraging valuable data within enterprises (16:00)The urgency of adopting LLMs in the enterprise (24:02)Expansion of LLMs into new business verticals (25:06)The challenges of operationalizing LLMs (29:32)Seamless experience with OpenAI (37:29)Operationalizing with Gradient (38:36)The early genesis of Gradient (48:53)The democratization of AI through endpoints (51:44)What is the future of language models? (54:07)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Oct 30, 2023 • 4min
The PRQL: How LLMs are Transforming Enterprise Workflows with Mark Huang of Gradient
In this bonus episode, Eric and Kostas preview their upcoming conversation with Mark Huang of Gradient.