DeepSpeed Huggingface Text Generation Examples

Setup

Python dependencies:

pip install -r requirements.txt

Usage

Examples can be run as follows:

deepspeed --num_gpus [number of GPUs] inference-test.py --name [model name/path] --batch_size [batch] --dtype [data type]

Single-batch Example

Command:

deepspeed --num_gpus 1 inference-test.py --name facebook/opt-125m

Output:

in=DeepSpeed is a machine learning framework                   
out=DeepSpeed is a machine learning framework based on TensorFlow. It was first released in 2015, then improved on 2016, and is now a major addition to the deep learning tools on GitHub.                                                                     
------------------------------------------------------------

Multi-batch Example

Command:

deepspeed --num_gpus 1 inference-test.py --name bigscience/bloom-3b --batch_size 2

Output:

in=DeepSpeed is a machine learning framework                                 
out=DeepSpeed is a machine learning framework that takes a machine learning algorithm and then uses those algorithms to find out how the user interacts with the environment. The company announced in July 2017 that it is ready for release - in 2018. It has
 been working on deep learning for about 6 years,                                                                                                                         
------------------------------------------------------------                                                                                                                                        
in=He is working on                                                      
out=He is working on the new video game 'Bloodborne's' expansion pack. Check out the trailer here: Bloodborne's expansion pack includes a complete remaster of the original game, including over 120 maps, playable characters, new quests, and the possibility
 to bring Blood
------------------------------------------------------------

`DSPipeline` utility class

The text-generation examples make use of the DSPipeline utility class, a class that helps with loading DeepSpeed meta tensors and is meant to mimic the Hugging Face transformer pipeline.

The BLOOM model is quite large and the way DeepSpeed loads checkpoints for this model is a little different than other HF models. Specifically, we use meta tensors to initialize the model before loading the weights:

with deepspeed.OnDevice(dtype=self.dtype, device="meta"):

This reduces the total system/GPU memory needed to load the model across multiple GPUs and makes the checkpoint loading faster. The DSPipeline class helps to load the model and run inference on it, given these differences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepSpeed Huggingface Text Generation Examples

Contents

Setup

Usage

Single-batch Example

Multi-batch Example

`DSPipeline` utility class

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepSpeed Huggingface Text Generation Examples

Contents

Setup

Usage

Single-batch Example

Multi-batch Example

DSPipeline utility class

`DSPipeline` utility class