Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
10 snips
Mar 30, 2025 • 44min

Overcoming Redis Limitations: The Dragonfly DB Approach

Roman Gershman, CTO and founder of Dragonfly DB, shares his journey from Google to creating a high-speed alternative to Redis. He dives into the challenges of developing in-memory databases, focusing on performance, scalability, and cost efficiency. Roman discusses operational complexities users face, while highlighting Dragonfly's compatibility with Redis and innovations like SSD tiering. He also explores programming trade-offs between C++ and Rust, emphasizing adaptability in database development and the importance of community feedback in shaping future advancements.
undefined
7 snips
Mar 24, 2025 • 53min

Bringing AI Into The Inner Loop of Data Engineering With Ascend

Sean Knapp, Founder and CEO of Ascend.io, shares his expertise in data engineering and AI's transformative role. He discusses how AI can streamline workflows, alleviate burdens for data engineers, and enhance productivity by automating tasks. Sean highlights challenges like data governance and the integration of AI into existing systems. The conversation also touches on bridging the gap between junior and senior engineers using AI as a collaborative tool, as well as the future potential of AI to revolutionize data engineering processes.
undefined
18 snips
Mar 16, 2025 • 52min

Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy

Pete DeJoy, co-founder and product lead at Astronomer, shares his extensive experience with Airflow, discussing its evolution and upcoming enhancements in Airflow 3. He highlights Astronomer's commitment to improving data operations and community involvement. The conversation dives into the critical role of data observability through Astra Observe, innovative use cases like the Texas Rangers in-game analytics, and the shifting landscape of data engineering roles, emphasizing collaboration and advanced tooling in the modern data ecosystem.
undefined
Mar 8, 2025 • 56min

Accelerated Computing in Modern Data Centers With Datapelago

SummaryIn this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data processing by reimagining system architecture. Rajan explains the shift from hyperconverged to disaggregated and composable infrastructure, highlighting the importance of accelerated computing in modern data centers. He discusses the evolution from proprietary to open, composable stacks, emphasizing the role of open table formats and the need for a universal data processing engine, and outlines Datapelago's strategy to leverage existing frameworks like Spark and Trino while providing accelerated computing benefits.AnnouncementsHello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Rajan Goyal about how to drastically improve efficiencies in data processing by re-imagining the system architectureInterviewIntroductionHow did you get involved in the area of data management?Can you start by outlining the main factors that contribute to performance challenges in data lake environments?The different components of open data processing systems have evolved from different starting points with different objectives. In your experience, how has that un-planned and un-synchronized evolution of the ecosystem hindered the capabilities and adoption of open technologies?The introduction of a new cross-cutting capability (e.g. Iceberg) has typically taken a substantial amount of time to gain support across different engines and ecosystems. What do you see as the point of highest leverage to improve the capabilities of the entire stack with the least amount of co-ordination?What was the motivating insight that led you to invest in the technology that powers Datapelago?Can you describe the system design of Datapelago and how it integrates with existing data engines?The growth in the generation and application of unstructured data is a notable shift in the work being done by data teams. What are the areas of overlap in the fundamental nature of data (whether structured, semi-structured, or unstructured) that you are able to exploit to bridge the processing gap?What are the most interesting, innovative, or unexpected ways that you have seen Datapelago used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Datapelago?When is Datapelago the wrong choice?What do you have planned for the future of Datapelago?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest gap in the tooling or technology for data management today?LinksDatapelagoMIPS ArchitectureARM ArchitectureAWS NitroMellanoxNvidiaVon Neumann ArchitectureTPU == Tensor Processing UnitFPGA == Field-Programmable Gate ArraySparkTrinoIcebergPodcast EpisodeDelta LakePodcast EpisodeHudiPodcast EpisodeApache GlutenIntermediate RepresentationTuring CompletenessLLVMAmdahl's LawLSTM == Long Short-Term MemoryThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
undefined
82 snips
Feb 26, 2025 • 60min

The Future of Data Engineering: AI, LLMs, and Automation

Gleb Mezhanskiy, CEO and co-founder of Datafold, shares insights from his journey in data engineering and the integration of AI. He discusses how large language models can streamline code writing, improve data accessibility, and facilitate testing and code reviews. Mezhanskiy emphasizes the challenges at the intersection of AI and data workflows, advocating for continuous adaptation. With practical applications like text-to-SQL and enhanced data observability, he paints an optimistic picture for the future of data engineering.
undefined
30 snips
Feb 16, 2025 • 39min

Evolving Responsibilities in AI Data Management

Bartosz Mikulski, an MLOps engineer with a rich background in data engineering, dives deep into the realm of AI data management. He highlights the crucial role of data testing in AI applications, especially with the rise of generative AI. Bartosz discusses the need for specialized datasets and the skills required for data engineers to transition into AI. He also addresses challenges like frequent data reprocessing and unstructured data handling, showcasing the evolving responsibilities in this fast-paced field.
undefined
27 snips
Jan 13, 2025 • 55min

CSVs Will Never Die And OneSchema Is Counting On It

Andrew Luo, CEO of OneSchema, shares his expertise in data engineering and CRM migration, focusing on the enduring relevance of CSVs. He discusses the common challenges of CSV data, such as inconsistency and lack of standards, and explains how OneSchema uses AI for improved type validation and parsing. Andrew highlights OneSchema's potential to streamline data imports and boost efficiency, particularly for non-technical users. He also reveals plans for future innovations, including industry-specific transformation packs to enhance data management further.
undefined
40 snips
Jan 3, 2025 • 58min

Breaking Down Data Silos: AI and ML in Master Data Management

Dan Bruckner, Co-founder and CTO of Tamr and former CERN physicist, shares his insights into master data management (MDM) enhanced by AI and machine learning. He discusses his transition from physics to data science, highlighting challenges in reconciling large data sets. Dan explains how data silos form within organizations and emphasizes the role of large language models in improving user experience and data trust. He advocates for combining AI capabilities with human oversight to ensure accuracy while tackling complex data management issues.
undefined
61 snips
Dec 23, 2024 • 50min

Building a Data Vision Board: A Guide to Strategic Planning

Lior Barak, a data expert with 15 years of experience in data product strategy, shares invaluable insights on strategic planning in data management. He introduces the concept of a 'data vision board' as a tool for organizations to align their data strategies with regulatory and stakeholder needs. Lior emphasizes the importance of balancing immediate demands with long-term goals, quantifying data issues for prioritization, and maintaining a flexible, living strategic plan. His practical advice encourages data teams to transition from mere enablers to impactful creators.
undefined
40 snips
Dec 16, 2024 • 60min

How Orchestration Impacts Data Platform Architecture

Hugo Lu, CEO and co-founder of Orchestra, delves into the vital role of data orchestration in platform architecture. He highlights how the choice of orchestration engines influences data flow management and overall efficiency. The discussion covers the evolution of orchestration from early models to modern applications like Kubernetes, reveals the challenges of traditional systems, and emphasizes the need for flexibility in architecture. Lu also addresses the distinct demands of analytical versus product-oriented applications, especially with the rise of AI integration.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode