Let's learn how to survive in the Modern Data Stack in 2023... Chat with Josh Wills (ex-Slack)
Jul 3, 2023
auto_awesome
Josh Wills, former Director of Data at Slack/Cloudera, discusses the challenges and benefits of the Modern Data Stack. They delve into topics like the evolution of data tech, the changing landscape of data and engineering, the potential of DBT, the future of hybrid execution, and the lack of framework in the modern data stack.
Duck DB simplifies data transformations and optimizes costs by integrating seamlessly with dbt and offering local execution capabilities.
The modern data stack has evolved due to hardware advancements and convenience, with a growing trend towards locally executed data operations for accessibility and reduced reliance on costly cloud services.
Deep dives
Duck DB and the Future of Data Engineering
Duck DB is a powerful yet simple tool that has gained popularity among data analysts and engineers. It allows for easy integration with dbt, providing a seamless experience for data transformations. The goal is to make dbt more accessible and convenient for users without compromising on functionality. Duck DB's local execution capabilities, along with its compatibility with other tools like Clickhouse, offer an ideal solution for those looking to optimize costs and scale their data operations. The future of data engineering lies in providing clarity and visibility in costs, enabling users to make informed decisions about their data stack and allocate resources efficiently.
The Evolution of the Modern Data Stack
The modern data stack has come a long way, from the days of Hadoop to the rise of cloud-based solutions like Snowflake. The key drivers of this evolution have been hardware advancements and convenience. Hardware costs have decreased, making it more feasible to process large volumes of data, and tools like Spark have leveraged memory effectively for faster processing. Open-source solutions like dbt have brought convenience by allowing users to download and set up the stack easily. The cloud has further enhanced these benefits with infinite storage and a convenient distribution model. Looking ahead, there is a growing trend towards locally executed data operations, leveraging the power of individual machines. This approach offers convenience and reduces reliance on costly cloud services, making data operations more accessible to a wider audience of developers.
DBT and the Democratization of Data
dbt has emerged as a popular tool for data transformations, enabling business users to democratize data within their organizations. Its accessibility and simplicity have made it a go-to choice for analysts and engineers alike. The key to dbt's success lies in its ability to bridge the gap between the data team and business users. With features like dbt cloud and data contracts, dbt enables business users to consume data with clarity and context, while providing data teams with the necessary control and visibility. dbt's plugin capabilities, such as the integration with DuckDB, offer even more flexibility and extend its functionality. The future of dbt lies in its evolution as a metadata business, solving the challenges of data integration and communication between different teams within an organization.
The Promise of Duck DB and Mother Duck
Duck DB is not only a powerful embedded database, but also an enabler of hybrid execution capabilities. By leveraging the compute power of local machines and using cloud-based services like Mother Duck for scalable data processing, organizations can optimize costs and achieve greater flexibility. The goal is to offer a seamless transition between Duck DB and cloud services as per the specific needs of each task or workload. This approach unlocks the potential for running data operations anywhere, whether it's on a Kubernetes cluster or even on a local laptop. The vision is to provide clarity and visibility in terms of costs and allow users to make informed decisions about the best execution environment for their data operations.
The Modern Data Stack swept in the industry last few years, and even though it's still in the buzz, it seems more daunting than empowering.
Ian and Tim sat down to chat with Josh Wills (ex-Director of Data at Slack / Cloudera) to talk about all things data, his dbt + duckdb package that he wants no more open source contributions.
Sit down and relax with this one, but we're sure you'll pick up something here with our YAIGers.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode