182: Building a Dynamic Data Infrastructure at Enterprise Scale Featuring Kevin Liu of Stripe
Mar 20, 2024
auto_awesome
Kevin Liu from Stripe discusses evolving data infrastructure, speech recognition work at Amazon, metadata analysis surprises, product sizing, data pipelining, and the future of open source projects in data infrastructure.
Stripe prioritizes data product development for enhanced customer data integration.
Challenges in centralized data management highlighted by metadata consistency across multiple platforms.
Trend towards componentized data systems with focus on Apache Iceberg and Arrow projects.
Deep dives
The Evolution of Data Council Austin
Attendees at Data Council Austin can join top speakers and startups to interact and learn about the latest technologies in data science and AI. The event aims to provide a vendor-neutral technical data conference experience, attracting smart founders, engineers, scientists, and industry leaders to collaborate on shaping the future of data and AI.
Kevin Lou's Role at Stripe
Kevin Lou, a software engineer at Stripe, has worked on data infrastructure using technologies such as Trino and Iceberg, supporting internal analytics. He has recently ventured into data product development with Stripe Data Pipeline, enabling merchants to efficiently integrate their Stripe data into their existing data ecosystems.
Innovation at Stripe through Data Productization
Stripe's move towards productizing data reflects customer demand for data accessibility beyond the platform's existing solutions like Stripe Sigma. By offering data integration capabilities through Stripe Data Pipeline, merchants can incorporate their Stripe data into their data infrastructure, enhancing their analytical capabilities and flexibility.
Challenges in Centralizing Data and Metadata Management
The centralization of data presents challenges in maintaining metadata across various storage and query platforms. Integrating data across warehouses like Redshift and Snowflake with table formats like Iceberg requires dealing with multiple catalogs, highlighting the complexity of managing consistent metadata across diverse data environments.
Emerging Data Technologies and Ecosystem Integration
Kevin Lou's interest in Apache Iceberg and Apache Arrow signifies a broader trend towards componentized data systems. Projects like Data Fusion aim to modularize traditional database functionalities, enabling a flexible approach to integrating planning, compute, and storage components across different data technologies.
Conclusion
The podcast episode featuring Kevin Lou from Stripe delves into the evolving landscape of data technologies, focusing on Stripe's approach to data productization, challenges in centralized data management, and the exploration of emerging technologies like Apache Iceberg and Arrow. Kevin Lou's insights shed light on the innovative solutions and integrations shaping the future of data handling and ecosystem interoperability.
Surprising Discoveries in Metadata Analysis (21:43)
Optimizing Cost and Value (23:55)
Product Sizing Stripe Data (26:39)
Popular Tool for Data Interaction (30:08)
Enabling Data Infrastructure Integration (35:22)
Value of Data Pipelining for Stripe (39:32)
Next Generation Product and Technology (43:54)
Maximizing value in a decentralized environment (51:34)
Future of open source projects in data infrastructure (57:59)
Final thoughts and takeaways (59:02)
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode