Google has launched Cloud Text-to-Speech, a new tool that can give developers easy access to voice-powered interactions in apps and services.
The AI-powered, cloud-based service can convert text to audible speech in 32 voices in 12 languages. It’s compatible with most apps already running on mobile devices, computers, tablets and IoT devices ranging from cars to appliances. (The service can be demoed on Google’s Cloud website.)
The service was created by DeepMind – a machine learning startup Google acquired in 2014 and brought to Edmonton last year – and uses its WaveNet voice technology.
Most text-to-voice services use what’s called “concatenative synthesis,” a process that stitches together recordings of syllables to mimic voices, often resulting “robotic” voices most people associate with text-to-speech programs, including Siri. But WaveNet – which has been used for Google’s Assistant since October – uses machine learning to analyze full audio waveforms of speech samples. This has resulted in more “natural” sounding speech, and can even recreate subtle touches like accents.
WaveNet was first displayed by Google in 2016, at a time when the computing power needed to run it was rare outside of research labs. But now, the team at DeepMind has gotten it to the point that it can run on most mobile devices.
Google has pegged several use cases for this kind of service, from call centre automation to creating more interactive experience with IoT devices to adding voice feedback to any text-based interaction