Google’s DeepMind Develops Best Human-Voice-Mimicking AI Yet

deepmindAlthough modern advancements in artificial intelligence have created machines that can look, speak, and think a lot like humans, it is still pretty easy to tell the difference. But that isn’t stopping scientists and engineers from trying to close that gap; and it looks like Google’s DeepMind AI leading the charge.

Indeed, Google’s DeepMind team has developed a new form of AI they call “WaveNet” and what makes this particular development so special is that WaveNet is the best software to mimic human speech.

In a blog post, DeepMind reports that WaveNet has the ability to learn the particular characteristics of several voices, both male and female. To ensure that the software knows which voice to use for a particular response, the engineers “conditioned the network on the identity of the speaker.”

For a while now, there has only been two methods for creating speech programs. One method involves massive data sets of words and speech fragments spoken in a single voice.

While this may sound practical, the singularity of the voice actually makes it difficult for the computer to manipulate specific sounds and intonations. The second method actually forms words electronically depending on how they are supposed to sound. This method is easier to manipulate, but the words sound robotic.

To build a speech program that would actually sound more human the research team fed raw audio wave forms recorded from actual human speakers into the neural network. Waveforms, of course, are the visual representation of the shapes that sounds can take. Basically, WaveNet uses these Waveforms to develop more human-like sounds.

DeepMind researcher Aaron van den Oord explains: “Mimicking realistic speech has always been a major challenge, with state-of-the-art systems, composed of a complicated and long pipeline of modules, still lagging behind real human speech. Our research shows that not only can neural networks learn how to generate speech, but they can already close the gap with human performance by over 50%. This is a major breakthrough for text-to-speech systems, with potential uses in everything from smartphones to movies, and we’re excited to publish the details for the wider research community to explore.”

DeepMind also goes on to say that WaveNet could eventually be able to mimic any kind of audio; or, rather, that it possesses the ability to do so, but it is just more complex than mimicking the human voice. Eventually, though, the AI should be able to mimic human singing.