The Changelog: Software Development, Open Source

Bringing Whisper and LLaMA to the masses (Interview)

Mar 22, 2023
Georgi Gerganov, a software developer renowned for his work on Whisper.cpp and llama.cpp, delves into the world of AI with a bang. He discusses how Whisper revolutionizes audio transcription and translation, while llama.cpp outpaces competitors in popularity. Georgi reveals the intricacies of running these models locally, breaking free from corporate control. The conversation also covers the challenges in audio processing, the power of Apple's Silicon for AI, and the exciting prospects for open-source tools, all sprinkled with a touch of humor about EULAs.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Whisper.cpp Accessibility

  • Whisper.cpp's simplicity and portability make large language models accessible to regular developers.
  • Its compact implementation within two source files makes it less intimidating.
ANECDOTE

Whisper Transcription Speed

  • Jerod Santo tried OpenAI's Whisper model, transcribing a podcast episode in 20 hours.
  • Georgi Gerganov's Whisper.cpp achieved the same in 4-5 minutes on an M1 Mac.
ANECDOTE

Whisper.cpp Use Cases

  • Whisper.cpp's examples include karaoke movie generation and real-time audio input.
  • Community projects range from iOS/macOS apps to web-based transcription services using WebAssembly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app