

The Data Stack Show
Rudderstack
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Episodes
Mentioned books

Nov 1, 2023 • 57min
162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient
Highlights from this week’s conversation include:The potential of AI-driven applications (1:34)The need for hardware infrastructure in AI experimentation (2:40)Oligopoly on the closed side (11:50)Advantages of private side vs. open source (13:18)Leveraging valuable data within enterprises (16:00)The urgency of adopting LLMs in the enterprise (24:02)Expansion of LLMs into new business verticals (25:06)The challenges of operationalizing LLMs (29:32)Seamless experience with OpenAI (37:29)Operationalizing with Gradient (38:36)The early genesis of Gradient (48:53)The democratization of AI through endpoints (51:44)What is the future of language models? (54:07)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Oct 30, 2023 • 4min
The PRQL: How LLMs are Transforming Enterprise Workflows with Mark Huang of Gradient
In this bonus episode, Eric and Kostas preview their upcoming conversation with Mark Huang of Gradient. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

6 snips
Oct 25, 2023 • 1h 21min
161: The Intersection of Generative AI and Data Infrastructure with Chang She of LanceDB
Highlights from the podcast include the challenges in data collection, AI hype impact, LanceDB's file and table format, Vector Database introduction, importance of unstructured data, potential of generative AI, and changing expectations in information systems.

Oct 23, 2023 • 5min
The PRQL: How Did Pandas Become a Data Science Powerhouse? Featuring Chang She of Eto Labs
In this bonus episode, Eric and Kostas preview their upcoming conversation with Chang She of Eto Labs. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

Oct 18, 2023 • 1h 6min
160: Closing the Gap Between Dev Teams and Data Teams with Santona Tuli of Upsolver
Highlights from this week’s conversation include:Santona’s journey from nuclear physics to data science (4:59)The appeal of startups and wearing multiple hats (8:12)The challenge of pseudoscience in the news (10:24)Approaching data with creativity and rigor (13:22)Challenges and differences in data workflows (14:39)Schema Evolution and Quality Problems (27:01)Real-time Data Monitoring and Anomaly Detection (30:34)The importance of data as a business differentiator (35:48)The SQL job creation process (46:25)Different options for creating solver jobs (47:20)Adding column-level expectations (50:17)Discussing the differences of working with data as a scientist and in a startup (1:00:18)Final thoughts and takeaways (1:04:01)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Oct 16, 2023 • 6min
The PRQL: The Intersection of Physics, Data Science, and Product Development with Santona Tuli of Upsolver
In this bonus episode, Eric and Kostas preview their upcoming conversation with Santona Tuli of Upsolver. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

Oct 11, 2023 • 1h 9min
159: What Is a Vector Database? Featuring Bob van Luijt of Weaviate
Highlights from this week’s conversation include:How music impacted Bob’s data journey (3:16)Music’s relationship with creativity and innovation (11:38)The genesis of Weaviate and the idea of vector databases (14:09)The joy of creation (19:02)OLAP Databases (22:21)The progression of complexity in databases (24:31)Vector database (29:23)Scaling suboptimal algorithms (34:34)The future of vector space representation (35:51)Databases role in different industries (39:14)The brute force approach to discovery (45:57)Retrieval augmented generation (51:26)How generative model interacts with the database (57:55)Final thoughts and takeaways (1:03:20)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Oct 9, 2023 • 5min
The PRQL: Enhancing Search and Recommendation Systems with Vector Databases with Bob van Luijt of Weaviate
In this bonus conversation, Eric and Kostas preview their upcoming conversation with Bob van Luijt of Weaviate. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com
for information about our collection and use of personal data for
advertising.

Oct 4, 2023 • 1h 2min
158: The Orchestration Layer as the Data Platform Control Plane With Nick Schrock of Dagster Labs
Nick Schrock, Founder of Dagster Labs, discusses his background in data engineering and the founding of Dagster Labs. They cover topics such as the evolution of data engineering, fragmentation in data infrastructure, the role of orchestration in data platforms, lessons learned from working with GraphQL, different orchestrators in the data infrastructure landscape, the role of MLOps in data engineering, and the future of data teams and orchestration.

Oct 2, 2023 • 3min
The PRQL: The Power of Data Orchestration: A Game-Changer for Data Infrastructure, Featuring Nick Schrock of Dagster Labs
Nick Schrock, Co-founder of Dagster Labs, discusses the power of data orchestration and its impact on data infrastructure. He explores the history, current state, and future of orchestration, and shares insights from his experience at Facebook.