

How Cartesia Edges Out The Big Labs With Audio AI Models, with Karan Goel, Founder and CEO at Cartesia
Mar 26, 2025
Karan Goel, Co-founder and CEO of Cartesia, dives into the future of voice AI and the groundbreaking use of state space models (SSMs) for audio applications. He details his transition from academia at CMU and Stanford to entrepreneurship, emphasizing the innovative efficiency of SSMs over traditional models. Karan also reveals how Cartesia is developing Sonic, an ultra-low latency text-to-speech model, and elaborates on the importance of rapid execution in voice AI, all while navigating the startup landscape.
AI Snips
Chapters
Transcript
Episode notes
Karan's Path to Cartesia
- Karan Goel's journey from IIT to CMU and Stanford, driven by a passion for AI and gaming.
- His PhD work with Chris Ray and Albert Gu led to the founding of Cartesia, focusing on state-space models.
SSMs vs. Transformers
- State-space models (SSMs) offer subquadratic scaling with context, unlike transformers' quadratic scaling.
- This makes SSMs more efficient for large context windows and long-running AI systems.
SSMs and Data Types
- SSMs excel at processing signal data like audio and video due to their compressibility.
- Their effectiveness in language modeling remains uncertain due to the lack of large-scale text SSMs.