Join Sarah Catanzaro, a general partner at Amplify Partners with a focus on data and AI infrastructure, as she discusses the evolving landscape of AI startups. She shares insights on the impact of the DBT-Fivetran merger and how data tools are vital for frontier labs. Sarah critiques the trend of massive seed funding without clear roadmaps while highlighting when such raises are warranted. Delve into exciting topics like memory management, personalization challenges in AI products, and the true essence of real-world training environments.
28:42
forum Ask episode
web_stories AI Snips
view_agenda Chapters
auto_awesome Transcript
info_circle Episode notes
insights INSIGHT
Merger Signals Growth, Not Collapse
The DBT–Fivetran merger wasn't the 'end' of the modern data stack; both companies were growing healthily.
The merger accelerates path to IPO by combining complementary strengths and scale.
insights INSIGHT
Frontier Labs Use Traditional Data Tools
Frontier AI labs rely on data tooling like DBT and Fivetran to manage training datasets and user interactions.
Analyzing agent/LLM interactions is more complex, so robust data tooling remains critical.
insights INSIGHT
Catalogs Failed For Humans, Not Machines
General-purpose data catalogs underperformed because core platforms added sufficient metadata features.
There may be a bigger opportunity in machine-centric metadata and governance than human discoverability.
Get the Snipd Podcast app to discover more snips from this episode
From investing through the modern data stack era (DBT, Fivetran, and the analytics explosion) to now investing at the frontier of AI infrastructure and applications at Amplify Partners, Sarah Catanzaro has spent years at the intersection of data, compute, and intelligence—watching categories emerge, merge, and occasionally disappoint. We caught up with Sarah live at NeurIPS 2025 to dig into the state of AI startups heading into 2026: why $100M+ seed rounds with no near-term roadmap are now the norm (and why that terrifies her), what the DBT-Fivetran merger really signals about the modern data stack (spoiler: it's not dead, just ready for IPO), how frontier labs are using DBT and Fivetran to manage training data and agent analytics at scale, why data catalogs failed as standalone products but might succeed as metadata services for agents, the consumerization of AI and why personalization (memory, continual learning, K-factor) is the 2026 unlock for retention and growth, why she thinks RL environments are a fad and real-world logs beat synthetic clones every time, and her thesis for the most exciting AI startups: companies that marry hard research problems (RAG, rule-following, continual learning) with killer applications that were simply impossible before.
We discuss:
The DBT-Fivetran merger: not the death of the modern data stack, but a path to IPO scale (targeting $600M+ combined revenue) and a signal that both companies were already winning their categories
How frontier labs use data infrastructure: DBT and Fivetran for training data curation, agent analytics, and managing increasingly complex interactions—plus the rise of transactional databases (RocksDB) and efficient data loading (Vortex) for GPU-bound workloads
Why data catalogs failed: built for humans when they should have been built for machines, focused on discoverability when the real opportunity was governance, and ultimately subsumed as features inside Snowflake, DBT, and Fivetran
The $100M+ seed phenomenon: raising massive rounds at billion-dollar valuations with no 6-month roadmap, seven-day decision windows, and founders optimizing for signal ("we're a unicorn") over partnership or dilution discipline
Why world models are overhyped but underspecified: three competing definitions, unclear generalization across use cases (video games ≠ robotics ≠ autonomous driving), and a research problem masquerading as a product category
The 2026 theme: consumerization of AI via personalization—memory management, continual learning, and solving retention/churn by making products learn skills, preferences, and adapt as the world changes (not just storing facts in cursor rules)
Why RL environments are a fad: labs are paying 7–8 figures for synthetic clones when real-world logs, traces, and user activity (à la Cursor) are richer, cheaper, and more generalizable
Sarah's investment thesis: research-driven applications that solve hard technical problems (RAG for Harvey, rule-following for Sierra, continual learning for the next killer app) and unlock experiences that were impossible before
Infrastructure bets: memory, continual learning, stateful inference, and the systems challenges of loading/unloading personalized weights at scale
Why K-factor and growth fundamentals matter again: AI felt magical in 2023–2024, but as the magic fades, retention and virality are back—and most AI founders have never heard of K-factor