Data Engineering Podcast

From Academia to Industry: Bridging Data Engineering Challenges

7 snips
Aug 26, 2025
In this engaging discussion, Professor Paul Groth from the University of Amsterdam shares his expertise in AI systems and intelligent data engineering. He dives into the evolution of data provenance and lineage, illustrating its significance in today's workflows. Paul also highlights the transformative impact of large language models on knowledge graph construction and data integration. The conversation addresses the synergy between academia and industry, emphasizing human-AI collaboration and the need for tailored data management solutions.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Journey From AI To Data Integration

  • Paul Groth started in AI then shifted to distributed computing and data provenance during his PhD.
  • He later built early graph databases and a large biomedical knowledge graph called OpenFax integrating ~20 sources.
INSIGHT

Lineage Expanded Beyond Databases

  • Paul treats data provenance and lineage as largely the same but notes the field broadened from DB internals to cross-system workflow tracing.
  • Modern interest focuses on tracing results across the organization, not just within one database.
ADVICE

Automate Graphs And Test Data Quality

  • Automate knowledge graph construction and mappings when possible to lower integration cost.
  • Invest in data quality and measure how data prep choices affect downstream ML models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app