
The Infra Pod From Spark to Eventual: Reinventing Data for the AI Era (Chat with Sammy from Eventual)
Dec 15, 2025
Sammy Sdu, CEO of Eventual and creator of the Daft platform, dives into the dilemmas of processing unstructured and multimodal data. He shares his journey from self-driving car research to founding Eventual, detailing his frustrations with Spark. Sammy discusses how to maintain data integrity in complex workflows and explores the impact of LLMs and agents on data pipelines. With insights into real-world applications, he highlights the shift toward accessible multimodal tools and the future of data engineering.
AI Snips
Chapters
Transcript
Episode notes
Middle-Of-The-Night Spark Breaking Point
- Sammy Sdu describes debugging JVM logs at 2 AM while trying to run a model over images from a self-driving car.
- That frustration led him to build a system alternative to Spark tailored for unstructured multimodal data.
Data Inflation Vs. OLAP Shrinkage
- Multimodal pipelines inflate data as you process it, unlike OLAP where queries shrink data through aggregation.
- Engines optimized for row/column analytics (e.g., Spark) break when each step multiplies data volume and memory needs.
Preserve Nesting To Retain Meaning
- Documents and video are highly nested and require diverse model types, so exploding them loses lineage and context.
- Keeping nested structure and offering primitives to compute on it preserves meaning and simplifies pipelines.
