The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    Meta unveils Audiobox AI for voice cloning, making ambient sounds

    12 December 2023No Comments5 Mins Read

    [ad_1]

    Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


    Voice cloning is one of the areas rapidly emerging thanks to generative AI. The term refers to replicating a person’s vocal stylings — pitch, timbre, rhythms, mannerisms, and unique pronunciations — through technology.

    While startups including ElevenLabs have received tens of millions in funding for dedicating themselves to this pursuit, Meta Platforms, the parent company of Facebook, Instagram, WhatsApp and Oculus VR has released its own free voice cloning program, Audiobox — with a catch.

    Unveiled today on Meta’s website by researchers working at the Facebook AI Research (FAIR) lab, Audiobox is described as a “new foundation research model for audio generation” build atop its earlier work in this area, Voicebox.

    “It can generate voices and sound effects using a combination of voice inputs and natural language text prompts — making it easy to create custom audio for a wide range of use cases,” reads the Audiobox webpage.

    VB Event

    The AI Impact Tour

    Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

     

    Learn More

    Simply type in a sentence that you want a cloned voice to say, or a description of a sound you want to generate, and Audiobox will do the rest. Users can also record their own voice and have it cloned by Audiobox.

    A ‘family’ of audio generating AIs

    Meta further noted that it actually created a “family of models,” one for speech mimicry and the other for generating more ambient sounds and sound effects such as dogs barking or sirens or children playing, and that they are all “built upon the shared self-supervised model Audiobox SSL.”

    Self-supervised learning (SSL) is a machine learning (ML) deep learning technique in which artificial intelligence algorithms are assigned to generate their own labels for data that is unlabeled, as opposed to supervised learning, where the data may already be labeled.

    The researchers published a scientific paper explaining some of their methodology and rationale for taking an SSL approach, writing “because labeled data are not always available or of high quality, and data scaling is the key to generalization, our strategy is to train this foundation model using audio without any supervision, such as transcripts, captions, or attribute labels, which can be found in larger quantities.”

    Of course, most leading generative AI models are heavily dependent on human generated data for training how to create new content, and Audiobox is no exception. The FAIR researchers relied upon “160K hours of speech (primarily English), 20K hours of music and 6K hours of sound samples.”

    “The speech portion covers audiobooks, podcasts, read sentences, talks, conversations, and in-the-wild recordings including various acoustic conditions and non-verbal voices. To ensure fairness and a good representation for people from various groups, it includes speakers from over 150 countries speaking over 200 different primary languages.”

    The research paper does not specify exactly where this data was sourced from and whether or not it was in the public domain, but that is surely an important question with various artists, authors, and music publishers suing a host of AI companies for training on potentially copyrighted material without the creators/rights owners’ express consent. We’ve reached out to a Meta spokesperson for clarification and will update when we receive it.

    You can try it yourself and clone your own voice now

    To showcase the capabilities of Audiobox, Meta has also released a host of interactive demos, including one that lets you record the audio of the user speaking about a sentence’s worth of text and replicates their voice.

    Then, the user can type in text that they want their cloned voice to say and hear it read back to them in their cloned voice.

    You can try it for yourself here. In my case, the resulting AI generated cloned audio was eerily similar, though not exactly the same as my own voice (as testified by my wife and child, who heard it not knowing what it was).

    Meta also allows users to generate whole new voices from text descriptions of what they should sound like “deep feminine voice” “high pitched masculine speaker from the U.S.” etc., as well as restyle voices recorded by the user, or type in a text prompt to generate whole new sound. I tried the latter with “dogs barking” and received two versions that were indistinguishable to the real thing in my ears.

    Now for the big catch: Meta includes a disclaimer with its Audiobox interactive demos noting that “this is a research demo and may not be used for any commercial purpose(s),” and furthermore, that it is restricted to those outside of “the States of Illinois or Texas,” which have state laws that apparently prohibit the kind of audio collection Meta is doing for the demos.

    Interestingly, like its new Imagine by Meta AI image generation web app unveiled last week, Audiobox also is not open source, bucking Meta’s commitment to the field that was evidenced earlier by the release of its Llama 2 family of large language models (LLMs). We also asked our Meta contact about this and whether Audiobox would be made open source at some point and will update when we receive a response.

    So, the technology can’t be used for any moneymaking/business purposes — nor can it be used by residents of two of the most populous states in the U.S. — for now. But with AI advancing at a rapid clip, expect this to change and there to be commercial versions in the near future, if not from Meta, from others.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleCitrusx emerges from stealth to make AI explainable
    Next Article How ConductorOne’s Copilot Improves Identity Governance with AI
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.