A new AI translation system for headphones clones multiple voices simultaneously - news.adtechsolutions A new AI translation system for headphones clones multiple voices simultaneously - news.adtechsolutions ​​​​​​​​​​​​​​​​​         

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

A new AI translation system for headphones clones multiple voices simultaneously


The translation of spatial speech consists of two AI models, the first of which shares the space surrounding the person wearing headphones into small regions and uses a neuronian network to search potential speakers and determine their direction.

The second model then translates the word speakers from French, German or Spanish into English text using public data available. The same model draws unique characteristics and emotional tone of the voice of each speaker, such as tone and amplitudes, and applies these properties to the text, basically creating a “cloned” voice. This means that when the translated version of the speaker’s word is transmitted to the wearing the headset a few seconds later, it sounds like it comes from the speaker direction, and the voice sounds very like its speaker, not a robotic sound computer.

Considering that the separation of human votes is difficult enough for AI systems, to be able to involve this ability in the real-time translation system, to map the distance between the holders and the speakers and to achieve decent delay on the actual device, Samuel Cornell, the postdoc researcher in the language of the Carnegie Mellon Technology, has not worked on projects.

“Real -time speech translation is incredibly difficult,” he says. “Their results are very good at restricted testing settings. But for the right product, they will need a lot more training information with noise and actual shots from the headphones, not purely relying on synthetic data.”

The Gollacsota Tim is now focused on reducing the time that Ai translation is initiated after the speaker says something, which will receive more natural conversations between people who speak different languages. “We want to really lower that delay for less than a second, so you can still have a conversational vibration,” Gollacsota says.

This remains the main challenge, because the speed at which the system can translate one language into another depends on the structure of the language. Of the three languages, it was trained by a spatial speech, the system was the fastest to translate French into English and then Spanish and German – reflecting that German, unlike other languages, sets the verbs of sentences and most of its meaning at the end, not at the beginning, says Claudio Fantioli, a researcher at the university.

Decreasing delays could make translations less accurately, warns: “The longer you wait [before translating]The more you have a context, the better the translation will be. It is an act of balance. “



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *