Latest AI Tools for AI Media Processing NEWS

[ad_1]

Introduction

In the vast field of artificial intelligence, deep learning has revolutionized many fields, including natural language processing, computer vision, and speech recognition. However, one fascinating area that has fascinated researchers and music enthusiasts alike is the generation of music using artificial intelligence algorithms. MusicGen, a state-of-the-art controlled text-to-music model that seamlessly transforms text requests into engaging musical compositions.

What is MusicGen?

MusicGen is an excellent model designed for music generation that offers simplicity and control. Unlike existing methods such as MusicLM, MusicGen is distinguished by eliminating the need for a self-monitoring semantic representation. The model uses a single-stage auto-regressive transformer architecture and is trained using a 32 kHz EnCodec tokenizer. Notably, MusicGen generates all four codebooks in one pass, which sets it apart from conventional approaches. By introducing a small delay between codebooks, the model demonstrates the ability to predict them in parallel, resulting in only 50 autoregressive audio steps per second. This innovative approach optimizes the efficiency and speed of the music production process.

MusicGen has trained over 20,000 hours of licensed music. They also trained it on an in-house database of 10K high-quality music tracks and on ShutterStock and Pond5 music data.

Prerequisites:

According to the official MusicGen GitHub repo https://github.com/facebookresearch/audiocraft/tree/main.

GPU with at least 16 GB of memory

Available MusicGen models

There are 4 pre-made models available and they are as follows:

Small: 300M model, text to music only
Medium: 1.5B model, text only over music
Melody: 1.5B model, text to music and text+melody to music
Large: 3.3B model, text only on music

experiments

Below is the conditional music generation output using the MusicGen large model.

Text Input: Jingle bell tune with violin and piano

Output: (Using MusicGen "large" model)

Below is the output of the MusicGen “melody” model. We used the above audio and text input to create the following audio.

Text Input: Add heavy drums drums and only drums

Output: (Using MusicGen "melody" model)

How to install MusicGen on Colab

Make sure you use the GPU for faster inference. It took ~9 minutes to generate 10 seconds of audio using the CPU, and only 35 seconds using the GPU(T4).

Before starting, make sure that the torch and torch are installed in the collab.

Install the Audiocraft library from Facebook.

!python3 -m pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft

Import required libraries.

from audiocraft.models import musicgen
from audiocraft.utils.notebook import display_audio
import torchfrom audiocraft.data.audio import audio_write

Load the model
The list of models is as follows:

# | model types are => small, medium, melody, large |
# | size of models are => 300M, 1.5B, 1.5B, 3.3B |
model = musicgen.MusicGen.get_pretrained('large', device='cuda')

Setting the parameters (optional)

model.set_generation_params(duration=60) # this will generate 60 seconds of audio.

Conditional Music Generation (Generate music by providing text.)

model.set_generation_params(duration=60)
res = model.generate( [ 'Jingle bell tune with violin and piano' ], progress=True)
# This will show the music controls on the colab

Generating unconditional music

res = model.generate_unconditional( num_samples=1, progress=True)
# this will show the music controls on the screendisplay_audio(res, 16000)

Generating a continuation of the music

To create a continuation of the music, we will need an audio file. We will feed this file to the model and the model will generate and add more music to it.

from audiocraft.utils.notebook import display_audio
import torchaudio


path_to_audio = "path-to-audio-file.wav"
description = "Jazz jazz and only jazz"


# Load audio from a file. Make sure to trim the file if it is too long!
prompt_waveform, prompt_sr = torchaudio.load( path_to_audio )
prompt_duration = 15
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr,
descriptions=[ description ], progress=True)
display_audio(output, sample_rate=32000)

Generation of melody conditional generation

model = musicgen.MusicGen.get_pretrained('melody', device='cuda')


model.set_generation_params(duration=20)


melody_waveform, sr = torchaudio.load("path-to-audio-file.wav")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
descriptions=['Add heavy drums'], melody_wavs=melody_waveform, melody_sample_rate=sr,progress=True)
display_audio(output, sample_rate=32000)

Burn the audio file to disk.

If you want to download a file from colab, then you will need to burn the wav file to disk. Here is a function that writes a wav file to disk. It will take the output of the model as the first input and the file name as the second input.

def write_wav(output, file_initials):
    try:
        for idx, one_wav in enumerate(output):
        audio_write(f'file_initials_idx', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
        return True
    except Exception as e:
        print("error while writing the file ", e)
        return None


# this will write a file that starts with bollywood
write_wav(res, "audio-file")

Full implementation (Google colab file link)

A complete implementation of Meta’s MusicGen library by Pragnakalp Techlabs is provided in the colab file. Feel free to explore and create music using it.
Pragnakalp Techlabs | Meta’s MusicGen implementation

conclusion

In conclusion, Audiocraft’s MusicGen is a powerful and controllable music generation model. Looking ahead, Audiocraft has exciting future potential for advancements in AI-generated music. Whether you’re a musician or an AI enthusiast, Audiocraft’s MusicGen opens up a world of creative possibilities.

[ad_2]

Source link

The RedPajama Project: An Open Source Initiative to Democratize LLMs

Mastering Data Science with Microsoft Fabric: A Tutorial for Beginners

Will AI kill your job?

Leave A Reply Cancel Reply

Create music using Meta’s MusicGen on Colab