How to Co-Train Voices Together

When deep mind launched wavenet in two thousand and 16 you needed about four hours worth of audio samples from a person to model how their voice sounds. But now you can do it with just a few minutes worth of audio. Google has built an enormous data set with professional voice actors reading out the same text. The model learns from all o these samples how particular words are pronounced. Now the third and final part is the acoustic modelling. Acoustic modelling focuses on who it sounds like. If i pretend to sound like my brother on the phone, it still sounds like me. My friend will be able to tell it. Mif i say the sentence with a different tone of voice, you

Play episode from 05:07

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app