The Infra Pod

From Spark to Eventual: Reinventing Data for the AI Era (Chat with Sammy from Eventual)

Dec 15, 2025
Sammy Sdu, CEO of Eventual and creator of the Daft platform, dives into the dilemmas of processing unstructured and multimodal data. He shares his journey from self-driving car research to founding Eventual, detailing his frustrations with Spark. Sammy discusses how to maintain data integrity in complex workflows and explores the impact of LLMs and agents on data pipelines. With insights into real-world applications, he highlights the shift toward accessible multimodal tools and the future of data engineering.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Middle-Of-The-Night Spark Breaking Point

  • Sammy Sdu describes debugging JVM logs at 2 AM while trying to run a model over images from a self-driving car.
  • That frustration led him to build a system alternative to Spark tailored for unstructured multimodal data.
INSIGHT

Data Inflation Vs. OLAP Shrinkage

  • Multimodal pipelines inflate data as you process it, unlike OLAP where queries shrink data through aggregation.
  • Engines optimized for row/column analytics (e.g., Spark) break when each step multiplies data volume and memory needs.
INSIGHT

Preserve Nesting To Retain Meaning

  • Documents and video are highly nested and require diverse model types, so exploding them loses lineage and context.
  • Keeping nested structure and offering primitives to compute on it preserves meaning and simplifies pipelines.
Get the Snipd Podcast app to discover more snips from this episode
Get the app