[ad_1]
Introduction
In the vast field of artificial intelligence, deep learning has revolutionized many fields, including natural language processing, computer vision, and speech recognition. However, one fascinating area that has fascinated researchers and music enthusiasts alike is the generation of music using artificial intelligence algorithms. MusicGen, a state-of-the-art controlled text-to-music model that seamlessly transforms text requests into engaging musical compositions.
What is MusicGen?
MusicGen is an excellent model designed for music generation that offers simplicity and control. Unlike existing methods such as MusicLM, MusicGen is distinguished by eliminating the need for a self-monitoring semantic representation. The model uses a single-stage auto-regressive transformer architecture and is trained using a 32 kHz EnCodec tokenizer. Notably, MusicGen generates all four codebooks in one pass, which sets it apart from conventional approaches. By introducing a small delay between codebooks, the model demonstrates the ability to predict them in parallel, resulting in only 50 autoregressive audio steps per second. This innovative approach optimizes the efficiency and speed of the music production process.
MusicGen has trained over 20,000 hours of licensed music. They also trained it on an in-house database of 10K high-quality music tracks and on ShutterStock and Pond5 music data.
Prerequisites:
According to the official MusicGen GitHub repo https://github.com/facebookresearch/audiocraft/tree/main.
- GPU with at least 16 GB of memory
Available MusicGen models
There are 4 pre-made models available and they are as follows:
- Small: 300M model, text to music only
- Medium: 1.5B model, text only over music
- Melody: 1.5B model, text to music and text+melody to music
- Large: 3.3B model, text only on music
experiments
Below is the conditional music generation output using the MusicGen large model.
Text Input: Jingle bell tune with violin and piano
Output: (Using MusicGen "large" model)
Below is the output of the MusicGen “melody” model. We used the above audio and text input to create the following audio.
Text Input: Add heavy drums drums and only drums
Output: (Using MusicGen "melody" model)
How to install MusicGen on Colab
Make sure you use the GPU for faster inference. It took ~9 minutes to generate 10 seconds of audio using the CPU, and only 35 seconds using the GPU(T4).
- Before starting, make sure that the torch and torch are installed in the collab.
Install the Audiocraft library from Facebook.
!python3 -m pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
Import required libraries.
from audiocraft.models import musicgen
from audiocraft.utils.notebook import display_audio
import torchfrom audiocraft.data.audio import audio_write
Load the model
The list of models is as follows:
# | model types are => small, medium, melody, large |
# | size of models are => 300M, 1.5B, 1.5B, 3.3B |
model = musicgen.MusicGen.get_pretrained('large', device='cuda')
Setting the parameters (optional)
model.set_generation_params(duration=60) # this will generate 60 seconds of audio.
Conditional Music Generation (Generate music by providing text.)
model.set_generation_params(duration=60)
res = model.generate( [ 'Jingle bell tune with violin and piano' ], progress=True)
# This will show the music controls on the colab
Generating unconditional music
res = model.generate_unconditional( num_samples=1, progress=True)
# this will show the music controls on the screendisplay_audio(res, 16000)
Generating a continuation of the music
To create a continuation of the music, we will need an audio file. We will feed this file to the model and the model will generate and add more music to it.
from audiocraft.utils.notebook import display_audio
import torchaudio
path_to_audio = "path-to-audio-file.wav"
description = "Jazz jazz and only jazz"
# Load audio from a file. Make sure to trim the file if it is too long!
prompt_waveform, prompt_sr = torchaudio.load( path_to_audio )
prompt_duration = 15
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr,
descriptions=[ description ], progress=True)
display_audio(output, sample_rate=32000)
Generation of melody conditional generation
model = musicgen.MusicGen.get_pretrained('melody', device='cuda')
model.set_generation_params(duration=20)
melody_waveform, sr = torchaudio.load("path-to-audio-file.wav")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
descriptions=['Add heavy drums'], melody_wavs=melody_waveform, melody_sample_rate=sr,progress=True)
display_audio(output, sample_rate=32000)
Burn the audio file to disk.
If you want to download a file from colab, then you will need to burn the wav file to disk. Here is a function that writes a wav file to disk. It will take the output of the model as the first input and the file name as the second input.
def write_wav(output, file_initials):
try:
for idx, one_wav in enumerate(output):
audio_write(f'file_initials_idx', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
return True
except Exception as e:
print("error while writing the file ", e)
return None
# this will write a file that starts with bollywood
write_wav(res, "audio-file")
Full implementation (Google colab file link)
A complete implementation of Meta’s MusicGen library by Pragnakalp Techlabs is provided in the colab file. Feel free to explore and create music using it.
Pragnakalp Techlabs | Meta’s MusicGen implementation
conclusion
In conclusion, Audiocraft’s MusicGen is a powerful and controllable music generation model. Looking ahead, Audiocraft has exciting future potential for advancements in AI-generated music. Whether you’re a musician or an AI enthusiast, Audiocraft’s MusicGen opens up a world of creative possibilities.
[ad_2]
Source link