

404: The Transcription Challenge: Building Infrastructure That Scales With The World
41 snips Jul 18, 2025
Discover the challenges of managing an overwhelming amount of audio data while building scalable transcription infrastructure. The speaker delves into innovative strategies for ensuring high-quality transcriptions despite varying podcast quality and volume. Learn how efficient systems are crucial for keeping up with the booming podcast industry. This insightful discussion offers valuable takeaways for anyone interested in transcription technology and podcasting.
AI Snips
Chapters
Transcript
Episode notes
Building for Podcast Scale
- Arvid built PodScan to transcribe all global podcast episodes, regardless of customer count.
- He tracks about 3.8 million shows and tens of thousands of daily new episodes.
Use Queues with Priority Levels
- Treat transcribing podcasts as a queuing system with priority tiers.
- Prioritize high-impact shows like Joe Rogan's for faster transcription.
Local Mac Studio Transcription
- Arvid ran his initial transcription queue locally on his Mac Studio using whisper.cpp.
- His Mac used the unified memory system to transcribe about 200 words per second.