Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
client.py		client.py
serve.py		serve.py
terminate.py		terminate.py

README.md

Persistent Deployment Examples

The serve.py script can be used to create an inference server for any of the supported models. Provide the HuggingFace model name and tensor-parallelism (use the default values and run $ python serve.py for a single-GPU mistralai/Mistral-7B-v0.1 deployment):

$ python serve.py --model "mistralai/Mistral-7B-v0.1" tensor-parallel 1

Connect to the persistent deployment and generate text with client.py. Provide the HuggingFace model name, maximum generated tokens, and prompt(s) (or if you are using the default values, run $ python client.py):

$ python client.py --model "mistralai/Mistral-7B-v0.1" --max-new-tokens 128 --prompts "DeepSpeed is" "Seattle is"

Shutdown the persistent deployment with terminate.py. Provide the HuggingFace model name (or if you are using the default values, run $ python terminate.py):

$ python terminate.py --model "mistralai/Mistral-7B-v0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persistent

persistent

README.md

Persistent Deployment Examples

Files

persistent

Directory actions

More options

Directory actions

More options

Latest commit

History

persistent

Folders and files

parent directory

README.md

Persistent Deployment Examples