From 4d7b7e1ddf5bd2deb7a55198171dbdd14f97ef1a Mon Sep 17 00:00:00 2001 From: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Date: Mon, 4 Dec 2023 16:43:10 +0000 Subject: [PATCH] transformers section in readme (#240) Adding HF's Transformers inference instructions for SeamlessM4T --- README.md | 9 +++++--- docs/m4t/README.md | 57 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 62 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index d9d3c534..e7eca642 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,9 @@ SeamlessM4T models support the tasks of: - Text-to-text translation (T2TT) - Automatic speech recognition (ASR) -:star2: We are releasing SemalessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks. +:star2: We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks. -To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs/m4t/README.md) or [🤗 Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large) +To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs/m4t/README.md) or [🤗 Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large). ## SeamlessExpressive @@ -124,7 +124,7 @@ You can also run the demo locally, by cloning the space from [here](https://hugg ## Running SeamlessM4T & SeamlessExpressive [Gradio](https://github.com/gradio-app/gradio) demos locally -To launch the same space demo we host on HuggingFace locally, +To launch the same demo Space we host on Hugging Face locally: ```bash cd demo @@ -132,6 +132,9 @@ pip install -r requirements.txt python app.py ``` +Seamless M4T is also available in the 🤗 Transformers library. For more details, refer to the [SeamlessM4T docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2) +or this hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/explanatory_notebooks/blob/main/seamless_m4t_hugging_face.ipynb). + # Resources and usage ## Model ### SeamlessM4T models diff --git a/docs/m4t/README.md b/docs/m4t/README.md index 3db4fae7..ebf1241a 100644 --- a/docs/m4t/README.md +++ b/docs/m4t/README.md @@ -13,6 +13,8 @@ This unified model enables multiple tasks without relying on multiple separate m - Text-to-text translation (T2TT) - Automatic speech recognition (ASR). +> [!NOTE] +> SeamlessM4T v2 and v1 are also supported in the 🤗 Transformers library, more on it [in the dedicated section below](#transformers-usage). ## SeamlessM4T v1 The v1 version of SeamlessM4T is a multitask adaptation of the *UnitY* architecture [(Inaguma et al., 2023)](https://aclanthology.org/2023.acl-long.872/). @@ -23,7 +25,6 @@ The v1 version of SeamlessM4T is a multitask adaptation of the *UnitY* architect The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture. *Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed. - ![SeamlessM4T architectures](seamlessm4t_arch.svg) ## SeamlessM4T models @@ -162,6 +163,60 @@ The `target` column specifies whether a language is supported as target speech ( Note that seamlessM4T-medium supports 200 languages in the text modality, and is based on NLLB-200 (see full list in [asset card](src/seamless_communication/cards/unity_nllb-200.yaml)) +## Transformers usage + +SeamlessM4T is available in the 🤗 Transformers library, requiring minimal dependencies. Steps to get started: + +1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main and [sentencepiece](https://github.com/google/sentencepiece): + +``` +pip install git+https://github.com/huggingface/transformers.git sentencepiece +``` + +2. Run the following Python code to generate speech samples. Here the target language is Russian: + +```py +from transformers import AutoProcessor, SeamlessM4Tv2Model + +processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large") +model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large") + +# from text +text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt") +audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze() + +# from audio +audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav") +audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array +audio_inputs = processor(audios=audio, return_tensors="pt") +audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze() +``` + +3. Listen to the audio samples either in an ipynb notebook: + +```py +from IPython.display import Audio + +sample_rate = model.sampling_rate +Audio(audio_array_from_text, rate=sample_rate) +# Audio(audio_array_from_audio, rate=sample_rate) +``` + +Or save them as a `.wav` file using a third-party library, e.g. `scipy`: + +```py +import scipy + +sample_rate = model.sampling_rate +scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text) +# scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio) +``` + +> [!NOTE] +> For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the +[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2), the +[SeamlessM4T v1 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t) or to this hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb). + ## Citation For *UnitY*, please cite : ```bibtex