The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    Meta AI unveils ‘Seamless’ translator for real-time communication across languages

    1 December 2023No Comments4 Mins Read

    [ad_1]

    Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


    Meta AI researchers announced on Thursday that they have developed a new suite of artificial intelligence models called Seamless Communication that aim to enable more natural and authentic communication across languages —  essentially making the concept of a Universal Speech Translator a reality. The models were publicly released this week along with research papers and accompanying data.

    The flagship model, called Seamless, merges capabilities from three other models — SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 — into one unified system. According to the research paper, Seamless is “the first publicly available system that unlocks expressive cross-lingual communication in real-time.”

    How Seamless works as a universal real-time translator

    The Seamless translator represents a new frontier in the use of AI for communication across the blog. It combines three sophisticated neural network models to enable real-time translation between over 100 spoken and written languages while preserving the vocal style, emotion, and prosody of the speaker’s voice.

    SeamlessExpressive focuses on preserving the vocal style and emotional nuances of the speaker’s voice when translating between languages. As described in the paper, “Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output.” 

    VB Event

    The AI Impact Tour

    Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

     

    Learn More

    SeamlessStreaming enables near real-time translation with only about two seconds of latency. The researchers say it is the “first massively multilingual model” to deliver such fast translation speeds across nearly 100 spoken and written languages.

    The third model, SeamlessM4T v2, serves as the foundation for the other two models. It is an upgraded version of the original SeamlessM4T model released last year. The new architecture delivers “improved consistency between text and speech output,” according to the paper.

    “In sum, Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology,” the researchers wrote.

    Potential to transform global communication

    The models’ capabilities could enable new voice-based communication experiences, from real-time multilingual conversations using smart glasses to automatically dubbed videos and podcasts. The researchers suggest it could also help break down language barriers for immigrants and others who struggle with communication.

    “By publicly releasing our work, we hope that researchers and developers can expand the impact of our contributions by building technologies aimed at bridging multilingual connections in an increasingly interconnected and interdependent world,” the paper states.

    However, the researchers acknowledge the technology could also be misused for voice phishing scams, deep fakes and other harmful applications. To promote safety and responsible use of the models, they implemented several measures including audio watermarking and new techniques to reduce hallucinated toxic outputs.

    Models publicly released on Hugging Face

    In keeping with Meta’s commitment to open research and collaboration, the Seamless Communication models have been publicly released on Hugging Face and Github.

    The collection includes the Seamless, SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 models along with accompanying metadata.

    By making these state-of-the-art natural language processing models freely available, Meta hopes to enable fellow researchers and developers to build upon and extend this work to help connect people across languages and cultures. The release underscores Meta’s leadership in open source AI and provides a valuable new resource for the research community.

    “Overall, the multidimensional experiences Seamless may engender could lead to a step change in how machine-assisted cross-lingual communication is accomplished,” the researchers concluded.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleThe copyright case against AI art generators just got stronger with more artists and evidence
    Next Article New transformer architecture can make language models faster and resource-efficient
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.