-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519
base: master
Are you sure you want to change the base?
Conversation
…ng deepspeed.init_inference The number of weight quantization groups can now be set by passing 'quantize_groups' argument to deepspeed.init_inference
Hi @sakogan, The changes in this PR looks good to me. May I ask that you link the PR on the example side so that I know how to use this feature? |
thanks for the review @RezaYazdaniAminabadi. I will submit the corresponding PR into the examples repo shortly. |
@RezaYazdaniAminabadi I opened a PR #713 on the examples repo that adds the corresponding argument to the inference test script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to test this change locally with a model?
I see your PR in DeepSpeedExamples as well, which looks good to me. Could you please provide a reproduction command?
Thanks,
Lev
Yes. Here is an example command: |
# Set the number of weight quantization groups if an optional 'quantize_groups' argument is given | ||
if "quantize_groups" in config_dict: | ||
if not ("dtype", torch.int8) in config_dict.items(): | ||
raise ValueError("'dtype' argument expected int8 when 'quantize_groups' argument is provided") | ||
quant = QuantizationConfig() | ||
quant.weight.q_groups = config_dict.pop("quantize_groups") | ||
config_dict["quant"] = quant | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are adding quantize_groups
as a shortcut for the admittedly convoluted current quantize config settings? For example, with this you could just pass quantize_groups=2
rather than quant={"weight":{"q_groups":2}}
.
But perhaps we should look into how we can simplify the quantize settings or at the very least add this logic to the DeepSpeedInferenceConfig
class as a pydantic validator (so that the config logic is consolidated there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose one could use a config file with quant={"weight":{"q_groups":2}}
instead. What I am suggesting here is a simple way to control this setting from a command line. I do agree that there might be better ways of achieving that than special-casing this argument in init_inference
Hi, does deepspeed support int8 inference? With next command I get following error |
This PR allows setting the number of quantization groups to be used during inference for the given model weights. Once merged, one should be able to control this number by passing the
quantize_groups
argument to adeepseed.init_inference
call, e.g., like is done in this inference test script: microsoft/DeepSpeedExamples@master...sakogan:DeepSpeedExamples:inference-test-enhance(Note that the latter example will be submitted to the DeepSpeedExamples repo as a separate PR if this PR is merged.)
Setting 'quantize_groups' is allowed only when the quantization is enabled, i.e., when
dtype
is set toint8
.