Barrchives cover image

Barrchives

How Cartesia Edges Out The Big Labs With Audio AI Models, with Karan Goel, Founder and CEO at Cartesia

Mar 26, 2025
Karan Goel, Co-founder and CEO of Cartesia, dives into the future of voice AI and the groundbreaking use of state space models (SSMs) for audio applications. He details his transition from academia at CMU and Stanford to entrepreneurship, emphasizing the innovative efficiency of SSMs over traditional models. Karan also reveals how Cartesia is developing Sonic, an ultra-low latency text-to-speech model, and elaborates on the importance of rapid execution in voice AI, all while navigating the startup landscape.
54:02

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • State-space models (SSMs) offer a more scalable and efficient alternative to traditional transformer architectures, particularly for managing large multimodal datasets like audio.
  • The transition from academic to entrepreneurial endeavors emphasizes the significance of collaborative relationships and shared visions for success in AI applications.

Deep dives

Understanding State-Space Models

State-space models (SSMs) are gaining attention for their ability to optimize the scalability of AI systems, particularly in processing vast amounts of information. Unlike traditional transformer architectures that experience quadratic scaling with context, SSMs aim for subquadratic scaling, meaning that while adding more context increases compute needs, the growth is proportionate rather than exponential. This efficiency is crucial in scenarios involving large multimodal datasets, such as audio and video. The ability of SSMs to manage context effectively positions them as an innovative solution for overcoming significant computational hurdles in AI.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner