

The Data Stack Show
Rudderstack
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Episodes
Mentioned books

Jun 11, 2021 • 50min
39: Diving deeper into CDC with Ali Hamidi and Taron Foxworth of Meroxa
Highlights from this week’s episode include:Meroxa is a real-time data engineering managed platform (4:53)Use cases for CDC (6:20)Meroxa leverages open source tools to provide initial snapshots and start the CDC stream (12:29)Making the platform publicly available (14:14)What the Meroxa user experience looks like (16:10)Raising Series A funding (17:49)Easiest and most difficult data sources for CDC (20:23)The current state of open CDC (23:16)Expected latency when using CDC (29:56CDC, reverse ETL, and a focus on real-time (36:39) Are existing parts of the stack when Meroxa is adapted? (39:45)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Jun 2, 2021 • 51min
38: Graph Databases & Data Governance with David Allen of Neo4j
Highlights from this week's episode include: David’s background in comparative databases (1:50)David’s experience and lessons he learned from writing his book (3:23)How writing a technical book compares to writing technical documentation (4:41)The process of writing a book (6:30)The best and worst part of David’s book writing experience (8:02)An introduction to what Neo4j is (9:08)What you need to graph (11:13)Typical problems a graph database is a good solution for (13:00)The difference between performance and relational databases (18:41)How Neo4j addresses performance and ergonomics (23:30)Neo4j and scalability (26:20)How Neo4j fits in the modern data stack (31:48)Neo4j use cases (35:45)Practical implementation of Neo4j (40:51)Neo4j’s relationship with open source (45:50)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

May 26, 2021 • 54min
37: The Components of Data Governance with Dave Melillo of FanDuel
Highlights from this week's episode include:Dave's "nerdy" interests in sports statistics and data (2:12)Trends in collecting, processing, and using data (4:45)Finding a better term for "reverse ETL" (5:48)The blurring of the distinction between sources and destinations (7:41)The role of BI is changing (13:24)Data governance and the physical execution behind it (19:00)Data governance is defining and managing data in a logical way that is actionable by the business (23:43)Consolidation of tools and services (28:49)Databricks vs. Snowflake (33:49) Dave's focus on regulatory data at FanDuel (45:47)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

May 19, 2021 • 43min
36: Crypto and Compliance with Nick Fogle, Co-Founder of Churnkey and Wavve
On this week's episode of The Data Stack Show, Eric and Kostas talk with Nick Fogle, co-founder of Churnkey and Wavve. Together they discuss how having a legal background can impact engineering decisions, dealing with privacy and compliance concerns, and selling Wavve and starting Churnkey as a result.Highlights from this week's episode include: Nick's background in economics and law and teaching himself to code (2:01)Thinking like a lawyer and trying to minimize risk to the greatest extent possible (4:23)GDPR and compliance (8:23)Blockchain contracts (18:26)Unique challenges surrounding compliance with a cryptocurrency startup (21:41)Reconciling the right to be forgotten, GDPR, and blockchain permanence (27:16)Building Churnkey after developing it as a way to lower churn among Wavve users (31:31)How Churnkey's stack works (37:16)Crypto predictions (39:02)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

May 12, 2021 • 54min
35: The Future of Development is Distributed with Jim Walker of Cockroach Labs
This week on The Data Stack Show, Eric and Kostas talk with Jim Walker, the VP of product marketing at Cockroach Labs, about distributed systems, competing against the speed of light, and making data easy.Highlights from this week's episode include: Jim background of translating deep technical concepts into understandable English and his work at Cockroach Labs (2:23)The origin of Cockroach Labs and distributed SQL (6:10) Living without Atomic Clocks (10:10)Having the speed of light as the ultimate competitor (13:49)CockroachDB’s users (19:35)Figuring out big data for transactions (25:14)Dealing with failure (35:04)Open source code, community, and consumption (39:26)Making data easy, and what's next for Cockroach (43:12)Bringing programming into marketing (46:18)Mentioned Links:Spanner White PaperRaft & PaxosMichael Stonebraker The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Apr 28, 2021 • 49min
34: The Intersection of Data Engineering and Marketing with John Marbachm of Grafana Labs
On this week's episode of The Data Stack Show, Eric and Kostas talk with John Marbach, senior growth manager at Grafana Labs. In this conversation, John discusses marketing ops and the blending of roles of data engineering and marketing.Highlights from this week's episode include:Grafana Labs John Marbach Senior Growth ManagerIntroduction to John Marbach and working in the blurred lines between marketing and data engineering (2:14)How managing pipeline building and consuming data influences the use of downstream tools (6:28)Experiments in marketing (11:28)Exploring the role of marketing ops (15:35)How accruing technical debt can grind things to a halt (20:35)Matching the stack with the company's scale (24:48)CDPs and marketing to developers (28:40)Biggest challenges and barriers between data engineering and marketing (35:19)Takes on reverse ETL (39:07)Thoughts on cryptocurrency and the blockchain (44:08)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Apr 14, 2021 • 57min
33: ML is a Data Quality Problem with Peter Gao from Aquarium Learning
On this week's episode of The Data Stack Show, Eric and Kostas talk with Peter Gao, co-founder, and CEO at Aquarium Learning. A former engineer at Cruise Automation, Peter and Aquarium Learning help ML teams improve their model performances by improving their data.Highlights from this week's episode include:How getting hit by a drunk driver made researching self-driving cars personal for Peter (2:12)Filtering out the hype in self-driving car news to get a clear picture of its state today (6:52)The data required for a self-driving vehicle (13:56)Operation Vacation and how Aquarium can help provide the tools to make models better (16:53)Utilizing neural networks to index data (20:41)How Aquarium fits in the ML stack (30:25)Interesting use cases of Aquarium (33:59)Distinguishing subclasses of machine learning (40:05)Human involvement in machine learning (46:13)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Apr 7, 2021 • 59min
32: Cooking with Data Ops with Chris Bergh from DataKitchen
On this week's episode of The Data Stack Show, Eric and Kostas talk with Chris Bergh, the CEO and head chef at Data Kitchen. DataKitchen’s mission is to provide the software, service, and knowledge that makes it possible for every data and analytics team to realize their full potential with DataOps.Highlights from this week's episode include: Chris' background and how the lessons learned in the Peace Corps and at NASA apply to him today (2:03)Why AI left Chris feeling like a jilted lover (7:49)Most projects that people do in data analytics fail (10:12)Three things that DataOps focuses on (16:37)Comparing and contrasting DevOps and DataOps (22:30)The types of data that DataKitchen handles and building a product or a service around DataOps (29:29)Fixing problems at the source instead of just offering a tool to slightly improve things downstream (37:17)Where we are at in the process of how companies are going to run on data (41:43)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Mar 31, 2021 • 43min
31: How a 160 Year-Old Publisher is Using Data with Jenna Lemonias From the Atlantic
On this week's episode of The Data Stack Show, Eric and Kostas chat with Jenna Lemonias, director of data science at The Atlantic. The Atlantic, a publication that's been around since 1857, is adapting with the times and is implementing and emulating some of the data science practices seen at big tech companies. Highlights from this week's episode include:Jenna's background in astrophysics and how she pivoted to data science (2:14)Differences in dealing with data at a FinTech company and then at a publication (4:40)The relationship between analog and digital data at The Atlantic (9:22)How The Atlantic structures its data science team (11:44)The role data engineering plays (14:42)Using natural language processing and machine-generated metadata (17:37)The Atlantic's data stack (28:22)The kind of data that's important to The Atlantic (29:44)Big projects forthcoming for the data science team (37:13)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Mar 24, 2021 • 1h 2min
30: The DataStack Journey with Rachel Bradley-Haas and Alex Dovenmuehle of Big Time Data
On this week’s episode of The Data Stack Show, Eric and Kostas are joined by the co-founders of Big Time Data, Rachel Bradley-Haas, and Alex Dovenmuehle, formerly of Mattermost and prior to that, Heroku. At Big Time Data, they work together to provide companies with the ability to derive value and insights from decentralized datasets, improve business processes through data enrichment and automation, and build a scalable foundation to enable a data-driven culture.Highlights from this week’s episode include:Rachel and Alex's background and their goal to make data approachable for companies everywhere (3:09)The data stack journey: making decisions when you're small that allow you to grow with your data and your organization (12:28)The problems faced when a data stack isn't nurtured early on (15:59)Changes in data stack technology (21:32)How Alex and Rachel's roles at Big Time Data differ and interact with each other (39:00)Client use cases (43:34)Comparing the stacks of seed-stage startups, mid-sized companies, and giant enterprises (48:54)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.