
Data Engineering Podcast
Adding An Easy Mode For The Modern Data Stack With 5X
Podcast summary created with Snipd AI
Quick takeaways
- 5X Data aims to improve the user experience of the modern data stack by offering integrated tools that simplify the process of building complete customer profiles and provide adaptability and flexibility in data lake analytics.
- 5X Data started as a course but pivoted to building a modular and customizable data platform that consolidates the fragmented data ecosystem, aiming to provide an end-to-end experience for managing the modern data stack.
- The industry's focus on core data stack components has created a gap in enterprise adoption of AI strategies, presenting an opportunity for 5X Data to deliver an end-to-end platform that simplifies data management and supports future AI initiatives.
Deep dives
Simplifying the Data Stack with 5X Data
5X Data is building a platform to improve the user experience of the modern data stack. They offer Ruddestack, an integrated tool that simplifies the process of building complete customer profiles by running joins and computations. They also provide Starburst, a data lake analytics platform that delivers adaptability and flexibility. The platform is built on top of existing tools and allows for customization and scalability, making it ideal for businesses of all sizes. By consolidating the data stack and providing a unified experience, 5X Data aims to make data teams more efficient and enable them to focus on delivering actionable insights.
The Evolution of 5X Data's Focus
5X Data started as a course for companies wanting to invest in data. After realizing the lack of expertise in the space, they pivoted to building a platform that provides a playbook and tools for businesses to easily build a scalable data platform. Over time, 5X Data has focused on creating an end-to-end experience and consolidating the fragmented data ecosystem. They aim to provide a modular and customizable data platform that evolves with the needs of their customers, offering an efficient and unified solution for managing the modern data stack.
The Impact of AI in the Data Infrastructure Ecosystem
The rapid pace of AI adoption in recent months has put pressure on data teams to develop an AI strategy. While AI is still in its infancy in the data space, there is a growing need for tools that support conversational BI and semantic layers to provide context for AI models. However, the industry is experiencing a delay in enterprise adoption, with many companies still focused on core data stack components. This gap between industry trends and enterprise adoption presents an opportunity for 5X Data to deliver an end-to-end platform that simplifies data management and supports future AI initiatives.
Consolidation and the Shift in the Data Infrastructure Economy
Data infrastructure has experienced a cycle of consolidation and ecosystem shifts. The previous trend of investing in various data tools without considering the ROI has given way to a more focused approach, driven by cost and pressure to show value. Consolidation is necessary to decrease complexity and optimize costs, but within each category, there will still be dominant players to cater to different use cases. 5X Data aims to provide a consolidated data platform that offers optionality for customers, with the ability to add or swap out vendors based on their specific needs.
Building a Simple and Efficient Data Platform with 5X Data
5X Data is committed to simplifying the data engineering process by providing a unified platform that reduces complexity and streamlines workflows. Their unified IDE allows users to operate various vendors and tools within a single interface, promoting consistency and best practices. The platform offers customization and flexibility, allowing engineers to work with their preferred tools while maintaining a cohesive experience. By eliminating the need for manual integrations and providing a centralized platform, companies can save time and resources, making their data teams more efficient.
Summary
The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- Your host is Tobias Macey and today I'm welcoming back Tarush Aggarwal to talk about what he and his team at 5x data are building to improve the user experience of the modern data stack.
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what 5x is and the story behind it?
- We last spoke in March of 2022. What are the notable changes in the 5x business and product?
- What are the notable shifts in the data ecosystem that have influenced your adoption and product direction?
- What trends are you most focused on tracking as you plan the continued evolution of your offerings?
- What are the points of friction that teams run into when trying to build their data platform?
- Can you describe design of the system that you have built?
- What are the strategies that you rely on to support adaptability and speed of onboarding for new integrations?
- What are some of the types of edge cases that you have to deal with while integrating and operating the platform implementations that you design for your customers?
- What is your process for selection of vendors to support?
- How would you characterize your relationships with the vendors that you rely on?
- For customers who have pre-existing investment in a portion of the data stack, what is your process for engaging with them to understand how best to support their goals?
- What are the most interesting, innovative, or unexpected ways that you have seen 5XData used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on 5XData?
- When is 5X the wrong choice?
- What do you have planned for the future of 5X?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Starburst:  This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics. Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. [dataengineeringpodcast.com/starburst](https://www.dataengineeringpodcast.com/starburst)
- Rudderstack:  Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)
- Materialize:  You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing. Go to [materialize.com](https://materialize.com/register/?utm_source=depodcast&utm_medium=paid&utm_campaign=early-access) today and get 2 weeks free!