Python dependencies:
pip install -r requirements.txt
Examples can be run as follows:
deepspeed --num_gpus [number of GPUs] inference-test.py --name [model name/path] --batch_size [batch] --dtype [data type]
Command:
deepspeed --num_gpus 1 inference-test.py --name facebook/opt-125m
Output:
in=DeepSpeed is a machine learning framework out=DeepSpeed is a machine learning framework based on TensorFlow. It was first released in 2015, then improved on 2016, and is now a major addition to the deep learning tools on GitHub. ------------------------------------------------------------
Command:
deepspeed --num_gpus 1 inference-test.py --name bigscience/bloom-3b --batch_size 2
Output:
in=DeepSpeed is a machine learning framework out=DeepSpeed is a machine learning framework that takes a machine learning algorithm and then uses those algorithms to find out how the user interacts with the environment. The company announced in July 2017 that it is ready for release - in 2018. It has been working on deep learning for about 6 years, ------------------------------------------------------------ in=He is working on out=He is working on the new video game 'Bloodborne's' expansion pack. Check out the trailer here: Bloodborne's expansion pack includes a complete remaster of the original game, including over 120 maps, playable characters, new quests, and the possibility to bring Blood ------------------------------------------------------------
The text-generation examples make use of the DSPipeline
utility class, a class that helps with loading DeepSpeed meta tensors and is meant to mimic the Hugging Face transformer pipeline.
The BLOOM model is quite large and the way DeepSpeed loads checkpoints for this model is a little different than other HF models. Specifically, we use meta tensors to initialize the model before loading the weights:
with deepspeed.OnDevice(dtype=self.dtype, device="meta"):
This reduces the total system/GPU memory needed to load the model across multiple GPUs and makes the checkpoint loading faster. The DSPipeline class helps to load the model and run inference on it, given these differences.