ThursdAI - The top AI news from the past week

ThursdAI - Apr 23rd - GPT Image & Grok APIs Drop, OpenAI ❤️ OS? Dia's Wild TTS & Building Better Agents!

120 snips
Apr 24, 2025
Kwindla Kramer, co-founder of Daily and voice AI expert, shares exciting advancements in text-to-speech technology and open-source voice activity detection. Maziyar Panahi, an AI researcher, discusses OpenAI's engaging collaboration with the open-source community. The conversation explores innovations like the new GPT Image generation API and NVIDIA's Describe Anything model. They delve into cutting-edge developments in voice interaction and how they're revolutionizing AI agents, enhancing both capabilities and performance in real-world applications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

OpenAI's Practical Open Source Strategy

  • OpenAI plans to release models regularly, focusing on usefulness rather than topping leaderboards.
  • The community prefers manageable sized models (70-200B parameters) with structured output, avoiding heavy reasoning for cost efficiency.
ANECDOTE

Dia: Open Source Emotional TTS

  • Two Korean students trained Dia, a 1.6B parameter emotional TTS model that impressed users.
  • Dia can produce overlapping speech and natural nonverbal sounds like laughter, making it highly expressive.
ADVICE

Balance Fun and Usefulness in TTS

  • For usable TTS, prioritize predictability and steerability over purely showing model capabilities.
  • Research models like Dia are fun but often require multiple runs to achieve good results.
Get the Snipd Podcast app to discover more snips from this episode
Get the app