AudioLDM 2 is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input.
AudioLDM 2 was proposed in the paper AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining by Haohe Liu
et al.
The model takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.
In this tutorial we will try out the pipeline, convert the models backing it one by one and will run an interactive app with Gradio!
This notebook demonstrates how to convert and run Audio LDM 2 using OpenVINO.
Notebook contains the following steps:
- Create pipeline with PyTorch models using Diffusers library.
- Convert PyTorch models to OpenVINO IR format using model conversion API.
- Run Audio LDM 2 pipeline with OpenVINO.
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.