Latent Space: The AI Engineer Podcast

Making Transformers Sing - with Mikey Shulman of Suno

15 snips
Mar 14, 2024
Mikey Shulman, CEO and co-founder of the music generation startup Suno, shares his journey from finance to creating innovative AI-driven audio experiences. The discussion dives into the fascinating challenges of transforming text into music and the unique complexities tied to audio creation. They explore the balance between accessibility and artistry, the emotional depth AI can express, and even compose a humorous country tune about cloud computing hurdles. Shulman also highlights the evolving role of AI in music sampling and audience participation.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Music Model Architecture

  • Audio generation lags behind text and image generation by one to two years.
  • Current music models, like those used by Suno, function similarly to text-based language models, predicting sequences of audio tokens.
ADVICE

General vs. Specific Models

  • Avoid imposing specific music theory into the model.
  • Allow the model to learn general music principles independently, similar to how large language models learn grammar.
ANECDOTE

Suno's Origins

  • Mikey Shulman and his co-founders at Suno initially focused on speech recognition due to market demand.
  • However, their passion for music led them to develop music models, resulting in Bark and eventually Suno.
Get the Snipd Podcast app to discover more snips from this episode
Get the app