A brand is a fictional person, and like a person it has many unique characteristics, including voice. A brand’s voice helps users instantly identify a brand’s personality through hearing. Today, Amazon’s cloud service Amazon Polly launched the “Brand Voice” business, a fully automated service. The service can convert text content into realistic speech, providing customers with specially customized voice services. As Amazon’s head of AI voice Rafal Kuklinski and senior product manager Ankit Dhawan explained in a blog post, Brand Voice allows companies to differentiate themselves from other brands by incorporating a unique sonic signature into their products and services. “Every company can have their own unique sonic brand,” they wrote. Amazon partnered with KFC to implant the latter's brand logo "KFC Grandpa" with an English accent from the Southern United States and launched it on the Amazon Alexa App. It also designed the Australian English voice for National Australia Bank, which migrated its contact center to Amazon Connect, Amazon’s omnichannel cloud contact center product. Late last year, Amazon detailed its work on using AI to generate speech in a research paper (“The Impact of Data Reduction Effects on Text-to-Speech Conversion”), in which researchers described a system that could learn a new speaking style with just a few hours of training. To achieve the same goal, a voice actor may need dozens of hours. Amazon's AI model consists of two parts. The first is a neural network that converts a sequence of phonemes into a sequence of spectrograms, or a visual representation of the frequency spectrum of a sound over time. The second is a vocoder, which converts the spectrogram into a continuous audio signal. The method for training this AI model combines a large amount of neutral-style speech data with data in the desired style and an AI system that can distinguish between speech. Amazon already uses it internally to generate new voices for Alexa. This technology has good commercial value. The brand voice (for example, the character Fio, played by actress Stephanie Courtney) is often tasked with recording a phone tree for an interactive voice response system or an e-learning script for a corporate training video. Synthesizers can make actors more efficient by reducing auxiliary recording and listening, while freeing them up to work creatively. Amazon and Google stand out in this space with Brand Voice and other text-to-speech services. Google recently launched 31 AI-synthesized WaveNet voices and 24 new standard voices for its Cloud Text-to-Speech service. Beyond that, Amazon has another notable competitor in Microsoft, which offers three AI-generated preview voices and 75 standard voices through the Azure Speech Service API. Amazon’s Brand Voice also competes with offerings from a number of startups, such as Voicery, which offer customized digital voices that sound impressively human. Text-to-speech startup iSpeech has similar voice tools, as do Modulate, Respeecher, Resemble AI, Descript and Bengaluru-based DeepSync. ( Source: Cross-border Sellers Teahouse ) |
Recently, FedEx released the "2025 US E-comme...
Yesterday, a piece of news from a freight forwarde...
It is learned that recently, the United States Pos...
WMall is an Indian social e-commerce platform that...
Fruugo was launched in Finland. It is an internati...
<span data-docs-delta="[[20,"获悉,10月27日,加拿大...
Many sellers have recently come to ask what the n...
Since the outbreak of the epidemic in 2020, global...
▲ Video account attention: cross-border navigation...
Normal, once there is data abnormality, such as s...
At the beginning of 2025, the logistics stick was ...
Some time ago, Amazon's US, European, and Japa...
Founded in 2015 , iiGears specializes in custom pr...
Today, Amazon US announced that it will prioritize...
From the perspective of Amazon users, here are so...