Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sakogan
Copy link
Contributor

@sakogan sakogan commented May 11, 2023

This PR allows setting the number of quantization groups to be used during inference for the given model weights. Once merged, one should be able to control this number by passing the quantize_groups argument to a deepseed.init_inference call, e.g., like is done in this inference test script: microsoft/DeepSpeedExamples@master...sakogan:DeepSpeedExamples:inference-test-enhance
(Note that the latter example will be submitted to the DeepSpeedExamples repo as a separate PR if this PR is merged.)

Setting 'quantize_groups' is allowed only when the quantization is enabled, i.e., when dtype is set to int8.

…ng deepspeed.init_inference

The number of weight quantization groups can now be set by passing 'quantize_groups' argument to deepspeed.init_inference
@RezaYazdaniAminabadi
Copy link
Contributor

Hi @sakogan,

The changes in this PR looks good to me. May I ask that you link the PR on the example side so that I know how to use this feature?
Thanks,
Reza

@sakogan
Copy link
Contributor Author

sakogan commented Aug 29, 2023

thanks for the review @RezaYazdaniAminabadi. I will submit the corresponding PR into the examples repo shortly.

@sakogan
Copy link
Contributor Author

sakogan commented Aug 31, 2023

@RezaYazdaniAminabadi I opened a PR #713 on the examples repo that adds the corresponding argument to the inference test script

@awan-10 awan-10 self-assigned this Sep 8, 2023
Copy link
Contributor

@lekurile lekurile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to test this change locally with a model?

I see your PR in DeepSpeedExamples as well, which looks good to me. Could you please provide a reproduction command?

Thanks,
Lev

@sakogan
Copy link
Contributor Author

sakogan commented Dec 11, 2023

Yes. Here is an example command:
deepspeed /path/to/DeepSpeedExamples/inference/huggingface/text-generation/inference-test.py --batch_size 1 --use_kernel --use_meta_tensor --dtype int8 --model bigscience/bloom-7b1 --quantize_groups 32

Comment on lines +340 to +347
# Set the number of weight quantization groups if an optional 'quantize_groups' argument is given
if "quantize_groups" in config_dict:
if not ("dtype", torch.int8) in config_dict.items():
raise ValueError("'dtype' argument expected int8 when 'quantize_groups' argument is provided")
quant = QuantizationConfig()
quant.weight.q_groups = config_dict.pop("quantize_groups")
config_dict["quant"] = quant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are adding quantize_groups as a shortcut for the admittedly convoluted current quantize config settings? For example, with this you could just pass quantize_groups=2 rather than quant={"weight":{"q_groups":2}}.

But perhaps we should look into how we can simplify the quantize settings or at the very least add this logic to the DeepSpeedInferenceConfig class as a pydantic validator (so that the config logic is consolidated there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose one could use a config file with quant={"weight":{"q_groups":2}} instead. What I am suggesting here is a simple way to control this setting from a command line. I do agree that there might be better ways of achieving that than special-casing this argument in init_inference

@kiucho
Copy link

kiucho commented Jul 7, 2024

Hi, does deepspeed support int8 inference? With next command I get following error
Command: deepspeed --num_gpus 1 inference-test.py --use_kernel --dtype int8 --model EleutherAI/gpt-j-6b
Error: !!!! kernel execution error. (m: 4096, n: 3, k: 12288, error: 15)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants