

The Data Engineering Show
The Firebolt Data Bros
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.
SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.
SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.
For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io
SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.
SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.
For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io
Episodes
Mentioned books

Jul 22, 2025 • 26min
Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber
Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani, Staff Engineer at Uber, as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants at scale. Get insights into the evolution from traditional chatbots to agent-based solutions, and learn practical lessons about staying current in the rapidly evolving AI landscape. Whether you're building AI-powered tools or scaling data infrastructure, this episode offers valuable perspectives on balancing innovation with real-world implementation.• Building and scaling RAG pipelines at enterprise scale• Evolution from traditional chatbots to AI agents• Practical insights on data processing and vector search implementation• Leveraging open-source technologies in production environments• Navigating rapid technological changes in AI developmentWhat You'll Learn:How Uber transformed its on-call support system by building an AI assistant that searches across internal documentation, wikis, and codeWhy combining multiple data sources with vector databases creates more accurate and contextual responses for enterprise supportThe evolution from basic RAG implementation to agent-based architecture for handling complex support scenariosHow to scale AI processing pipelines using Apache Spark for large-scale data chunking and embedding generationWhy customization and internal data sources are crucial for enterprise AI assistant effectivenessThe future of AI assistants: moving from documentation lookup to automated problem resolution through multi-agent systemsHow to balance rapid AI innovation with setting realistic customer expectations in fast-moving tech environmentsPaarth is a Staff Engineer at Uber, where he works on Michelangelo, Uber's machine learning platform. With over four years at Uber, he specializes in feature store development, online serving at scale, and GenAI implementations. He has been instrumental in developing Genie, an AI-powered on-call assistant that revolutionizes how Uber's engineering teams handle support requests and documentation access. In this episode, Paarth shares valuable insights on building and scaling RAG-based systems, vector search implementations, and the evolution of AI assistants from traditional chatbots to sophisticated agent-based solutions. His experience spanning both AWS chatbot development and current GenAI innovations at Uber offers listeners a unique perspective on the rapid advancement of AI-powered enterprise solutions.Quotes"Think of Genie as your on-call assistant. Different infra teams have their Slack channels, and because these technologies are widely used, you have to wait a lot." - Paarth"What we realized is for our engineers to really get help, data sources really should be internal only because we customize lot of these open source engines for making it work at Uber scale." - Paarth"Instead of building a mega scale pipeline that just ingest all data sources and then keeps a central data source solution, we instead are giving users the flexibility to ingest what data sources they want." - Paarth"We had to scale our you can say the whole infrared layer to chunk data faster to be able to create embedding set scale." - Paarth"It almost felt like they're doing what EMR was doing. You have your Hadoop and big data technology, and we needed these pipelines to basically process all this data quickly." - Paarth"We've even evolved from just giving you the right documentation to starting to evolve into a situation where we'll also start taking actions on your behalf." - Paarth"That intuition that comes from building this kind of bot, I feel like that intuition came again as we were starting to see this technology come, and we're like, hey, this looks like where you can pretty much fit all these pieces together." - Paarth"What we have seen with several use cases is agentic genie works well when designed well, when you've analyzed the problem of which type of subproblems the bot should resolve per channel, per use case." - Paarth"I think having a problem in mind always helps that way, the energy is little bit focused and directed." - Paarth"Whatever you're building is not enough because the expectation has already gone to the next level, so the pace is too fast right now." - PaarthResourcesCompanies & Platforms:Uber - ML Platform & EngineeringFirebolt - Cloud Data Warehouse (firebolt.io)Tools & Technologies:Michelangelo - Uber's ML Platform Genie - Uber's On-Call Assistant BotCursor - Developer IDEOpenSearch - Vector DatabaseLangGraph - Agent FrameworkNotable Projects Mentioned:MetaMate (Meta)Query Copilot (Uber)Scale at AI (Meta Meetup)Company Blogs:Uber Engineering Blog - Genie and Query Optimization articles Primary Speakers:Paarth Chotani - Staff Engineer, UberBenjamin - FireboltEldad - FireboltThe Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

26 snips
Jun 10, 2025 • 22min
From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta
In this discussion, Sumit Gupta, Lead BI Engineer at Notion, shares insights from his journey through tech giants like Snowflake and Dropbox. He highlights how modern data stacks are evolving with tools like dbt and Iceberg, while emphasizing the shift from technical skills to crucial transferable skills in the AI era. Sumit explains how AI is revolutionizing workflows and automating content creation, stressing the importance of balancing automation with genuine human connections. He also provides tips on adapting to the rapid changes in data and AI technologies.

May 7, 2025 • 32min
How Rising Wave Is Redefining Real-Time Data with Postgres Power
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the evolution of stream processing systems and the innovations his company is bringing to the space.What you’ll learn:Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system.How Rising Wave's architecture, using S3 as primary storage, delivers second-level scalability, while other systems can take hours to scale.The competitive landscape of stream processing, with Rising Wave's Postgres compatibility providing a significant advantage in ease of use.How one major company reduced its CPU requirements from 20,000 to just 600 by switching from a traditional stream processing system to Rising Wave.The rising importance of Apache Iceberg as a destination for stream processing output, helping companies avoid vendor lock-in.How streaming systems fit into modern data stacks, especially as companies seek to avoid being locked into proprietary systems.Yingjun Wu is the founder and CEO of Rising Wave, a stream processing system built in Rust and designed with a cloud-native architecture. With a PhD focused on stream processing and database systems, Yingjun previously worked at Redshift and IBM Research before founding Rising Wave. His company has developed a system that achieves significant performance and resource efficiency advantages over traditional stream processing solutions, while maintaining Postgres compatibility for ease of use.Episode Highlights:The Origins of Rising Wave (00:30)Yingjun shares his background in stream processing from his PhD days and explains how his experience at Redshift revealed the need for better stream processing solutions, especially since many data warehouse workloads involve data ingested from streaming sources like Kinesis or Kafka.Building a System from Scratch (04:10)Yingjun describes the challenging first 2-3 years of developing Rising Wave without customers, highlighting how trust is a major barrier for new database systems. After 2.5 years, they secured their first customers, including a startup and several larger companies, which helped establish Rising Wave's credibility.The Current Stream Processing Landscape (07:47)Benjamin asks about the current stream processing space, with Yingjun positioning Rising Wave as a leader, particularly for SQL-based workloads. He highlights several key advantages of Rising Wave, including its Rust-based implementation and S3-based storage architecture.S3 as Primary Storage (10:27)Yingjun explains their decision to use S3 as primary storage from day one, despite its slowness and expense. He discusses how they've optimized for these challenges and would still make the same architectural choice today due to benefits like simplified state management and superior elastic scaling.The Business Model (13:52)Rising Wave offers open-source, cloud, and on-premise versions of its product. Yingjun notes that many highly regulated industries require on-premise deployment, including customers in the banking and aerospace sectors.Typical Users and Competitive Advantages (15:01)When asked about their typical users, Yingjun explains they directly compete with Flink but have advantages in ease of use due to Postgres compatibility. Their users are either new to stream processing or are migrating from systems like Spark Streaming or Flink due to performance issues or development complexity.Apache Iceberg Integration (19:25)Yingjun discusses how Apache Iceberg is emerging as an important destination for Rising Wave output, as companies seek to avoid vendor lock-in with proprietary data warehouses. He explains how Rising Wave typically performs ETL functions before data is sent to Iceberg tables.The Future of Data Management (32:06)The conversation concludes with a discussion about Iceberg becoming a "single source of truth" for data, with multiple specialized query engines potentially accessing the same data. Yingjun and Eldad share perspectives on how this shift away from proprietary data lock-in is changing the data ecosystem.Episode Resources:Rising Wave WebsiteYingjun Wu LinkedInThe Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

6 snips
Apr 8, 2025 • 24min
Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach
Lisa Cao, Product Manager at DataStrato, dives into the world of data governance, sharing her expertise in AI/ML and open-source frameworks. The discussion highlights Apache Gravitino's unique capabilities, enabling unified governance across diverse data systems. They tackle the 'Push-Down Permission Management' model, essential for security, and the growing trend towards open ecosystems that prioritize flexibility. Lisa also emphasizes the importance of real-world tool adoption versus social media hype, keeping data engineers agile in a fast-paced landscape.

4 snips
Mar 19, 2025 • 31min
Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen
Hannes Mühleisen, CEO of DuckDB Labs and a professor in the Netherlands, discusses the innovative journey of DuckDB, an open-source analytical database that’s making waves with 10 million monthly downloads. He highlights how DuckDB differs from SQLite and its powerful analytical capabilities. Hannes also dives into the system's flexible ecosystem, allowing for custom functionalities. A fascinating discussion on AI’s impact on database management showcases the balance between traditional SQL usage and modern technological advancements.

4 snips
Feb 11, 2025 • 31min
AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma
Daniel Pálma, Head of Marketing at Estuary, shares his journey from data engineer to marketing professional. He discusses how his tech background enriches his marketing strategies. The conversation dives into the impact of AI on data movement and how cloud solutions are transforming data integration. Daniel highlights the rise of vector databases and structured data challenges in AI, as well as the promising future of Apache Iceberg in data lakehouses. Finally, he emphasizes the growing need for data practitioners in this golden age of data.

Jan 7, 2025 • 37min
AI and Data Change Management with Chad Sanderson, CEO Gable AI
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI to explore the interesting world of data change management.Join them as they:Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain.Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and communicate the impact of changes before they reach production.Explain why the gap between data producers and consumers need to be bridged and how Gable continues to emphasize the need for effective communication and understanding data change management across teamsShine light on how AI can enhance data management by extracting semantics from code and effectively manage the translation output.Discuss Chad’s vision for 2025 which is to help companies start to care about data and how the changes made to data affect other people.Chad Sanderson is the CEO and co-founder of Gable AI, a data change management platform. Chad has over a decade of experience in data engineering and infrastructure space, holding significant roles at major companies like Microsoft, Oracle, Sephora where he focused on data quality and governance challenges. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp. His journey from data scientist to data engineer and ultimately to CEO was driven by a desire to transform how organizations manage and utilize data. Gable AI addresses the complexities of the data supply chain, by providing tools for code scanning, data contracts and governance as code, enabling teams to proactively manage data changes and impact.If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube. Episode ResourcesGable AI websiteChad Sanderson on LinkedInThe Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

Nov 26, 2024 • 25min
Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.They discuss:From Excel to Expert: From basic Excel tasks to a full mastery of BI tools like QlikView, Wouter has blended his technical and philosophical approaches to data to become a bona fide expert.Data Strategy as Transformation: Good change management principles have to be adhered to if a BI project is going to bear fruit. Focus on leadership alignment, KPI clarity, and user empowerment instead of simply implementing software. Challenges of Starting Small: Wouter has some tips to offer smaller companies around bootstrapping their data journey using existing tools, practical education, and even Gen AI.Balancing Scales: Smaller startups compared to large enterprises face a very different set of challenges.Wouter’s combination of philosophy and pragmatism brings fresh takes to building effective data solutions.The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

Oct 31, 2024 • 28min
Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan
In this special roundup episode of The Data Engineering Show, the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.Topics include:Foundations of Data Engineering: Zach Wilson emphasizes the importance of core, tech-agnostic skills in data modeling, quality assurance, and storytelling. By sharing his experiences at Airbnb and in education, he reveals that effective data engineering hinges on creating robust data models, quality controls, and persuasive narratives rather than expertise in any single tool or language.Bridging Academia and Practice: Matthew Housley and Joe Reis delve into the need for better data education, emphasizing hands-on experience and data fundamentals over tool-specific training, and advocate for apprenticeships and real-world collaborations in educational settings.Legacy Meets Modern in Data Engineering: Krishnan Viswanathan reflects on recurring themes in data engineering and the importance of adapting legacy approaches to new data needs, underscoring the challenges and benefits of vendor-built versus in-house solutions.Join the Bros for a well-rounded exploration of current themes in data engineering, filled with practical advice for data professionals at any stage of their journey.The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing

Sep 24, 2024 • 33min
The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn
In this episode of The Data Engineering Show, the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows.Together they cover:Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting up Kafka topics, Flink jobs, and schemas.Integration with Kubernetes: The project utilizes Kubernetes to handle infrastructure tasks, treating Kubernetes as a database for managing state. This integration simplifies the orchestration of data workflows and automates routine tasks.Consumer-Driven Model: Ryanne discusses the shift from a producer-driven to a consumer-driven data model, emphasizing the importance of understanding and addressing consumer needs to reduce engineering complexity and optimize data systems.Future of Data Engineering: The conversation touches on the ongoing experimental nature of Hoptimator and its potential to transform data engineering practices, highlighting its impact on LinkedIn's data infrastructure.The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.soPrevious guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.Check out our three most downloaded episodes:Zach Wilson on What Makes a Great Data EngineerJoe Reis and Matt Housley on The Fundamentals of Data EngineeringBill Inmon, The Godfather of Data Warehousing