How Cartesia Edges Out The Big Labs With Audio AI Models, with Karan Goel, Founder and CEO at Cartesia

Mar 26, 2025

Karan Goel, Co-founder and CEO of Cartesia, dives into the future of voice AI and the groundbreaking use of state space models (SSMs) for audio applications. He details his transition from academia at CMU and Stanford to entrepreneurship, emphasizing the innovative efficiency of SSMs over traditional models. Karan also reveals how Cartesia is developing Sonic, an ultra-low latency text-to-speech model, and elaborates on the importance of rapid execution in voice AI, all while navigating the startup landscape.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Karan's Path to Cartesia

Karan Goel's journey from IIT to CMU and Stanford, driven by a passion for AI and gaming.
His PhD work with Chris Ray and Albert Gu led to the founding of Cartesia, focusing on state-space models.

INSIGHT

SSMs vs. Transformers

State-space models (SSMs) offer subquadratic scaling with context, unlike transformers' quadratic scaling.
This makes SSMs more efficient for large context windows and long-running AI systems.

INSIGHT

SSMs and Data Types

SSMs excel at processing signal data like audio and video due to their compressibility.
Their effectiveness in language modeling remains uncertain due to the lack of large-scale text SSMs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app