Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

sakogan · 2023-05-11T17:51:58Z

This PR allows setting the number of quantization groups to be used during inference for the given model weights. Once merged, one should be able to control this number by passing the quantize_groups argument to a deepseed.init_inference call, e.g., like is done in this inference test script: microsoft/DeepSpeedExamples@master...sakogan:DeepSpeedExamples:inference-test-enhance
(Note that the latter example will be submitted to the DeepSpeedExamples repo as a separate PR if this PR is merged.)

Setting 'quantize_groups' is allowed only when the quantization is enabled, i.e., when dtype is set to int8.

…ng deepspeed.init_inference The number of weight quantization groups can now be set by passing 'quantize_groups' argument to deepspeed.init_inference

RezaYazdaniAminabadi · 2023-08-16T03:15:01Z

Hi @sakogan,

The changes in this PR looks good to me. May I ask that you link the PR on the example side so that I know how to use this feature?
Thanks,
Reza

sakogan · 2023-08-29T21:49:25Z

thanks for the review @RezaYazdaniAminabadi. I will submit the corresponding PR into the examples repo shortly.

sakogan · 2023-08-31T20:24:46Z

@RezaYazdaniAminabadi I opened a PR #713 on the examples repo that adds the corresponding argument to the inference test script

lekurile

Were you able to test this change locally with a model?

I see your PR in DeepSpeedExamples as well, which looks good to me. Could you please provide a reproduction command?

Thanks,
Lev

sakogan · 2023-12-11T16:32:36Z

Yes. Here is an example command:
deepspeed /path/to/DeepSpeedExamples/inference/huggingface/text-generation/inference-test.py --batch_size 1 --use_kernel --use_meta_tensor --dtype int8 --model bigscience/bloom-7b1 --quantize_groups 32

mrwyattii · 2024-01-03T22:10:46Z

deepspeed/__init__.py

+    # Set the number of weight quantization groups if an optional 'quantize_groups' argument is given
+    if "quantize_groups" in config_dict:
+        if not ("dtype", torch.int8) in config_dict.items():
+            raise ValueError("'dtype' argument expected int8 when 'quantize_groups' argument is provided")
+        quant = QuantizationConfig()
+        quant.weight.q_groups = config_dict.pop("quantize_groups")
+        config_dict["quant"] = quant
+


I believe you are adding quantize_groups as a shortcut for the admittedly convoluted current quantize config settings? For example, with this you could just pass quantize_groups=2 rather than quant={"weight":{"q_groups":2}}.

But perhaps we should look into how we can simplify the quantize settings or at the very least add this logic to the DeepSpeedInferenceConfig class as a pydantic validator (so that the config logic is consolidated there).

Yes, I suppose one could use a config file with quant={"weight":{"q_groups":2}} instead. What I am suggesting here is a simple way to control this setting from a command line. I do agree that there might be better ways of achieving that than special-casing this argument in init_inference

kiucho · 2024-07-07T06:47:50Z

Hi, does deepspeed support int8 inference? With next command I get following error
Command: deepspeed --num_gpus 1 inference-test.py --use_kernel --dtype int8 --model EleutherAI/gpt-j-6b
Error: !!!! kernel execution error. (m: 4096, n: 3, k: 12288, error: 15)

allow controlling the number of weight quantization groups when calli…

9773cf9

…ng deepspeed.init_inference The number of weight quantization groups can now be set by passing 'quantize_groups' argument to deepspeed.init_inference

sakogan requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, cmikeh2 and arashb as code owners May 11, 2023 17:51

sakogan and others added 2 commits May 11, 2023 13:56

Merge branch 'master' into quant-params

a173580

fix error message formatting

6eb0fc4

Merge branch 'microsoft:master' into quant-params

7f32433

This was referenced Aug 30, 2023

Inference test enhance deepspeedai/DeepSpeedExamples#710

Closed

Inference test enhance deepspeedai/DeepSpeedExamples#713

Open

awan-10 self-assigned this Sep 8, 2023

Merge branch 'master' into quant-params

56644a3

lekurile reviewed Dec 8, 2023

View reviewed changes

Merge branch 'master' into quant-params

0c3d46e

mrwyattii reviewed Jan 3, 2024

View reviewed changes

Merge branch 'master' into quant-params

0628e18

loadams requested review from hwchen2017, loadams and tjruwase as code owners January 24, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

sakogan commented May 11, 2023

RezaYazdaniAminabadi commented Aug 16, 2023

sakogan commented Aug 29, 2023

sakogan commented Aug 31, 2023

lekurile left a comment

sakogan commented Dec 11, 2023

mrwyattii Jan 3, 2024

sakogan Jan 5, 2024

kiucho commented Jul 7, 2024 •

edited

Loading

Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

Are you sure you want to change the base?

Extend DeepSpeed inference initialization API with a 'quantize_groups' argument #3519

Conversation

sakogan commented May 11, 2023

RezaYazdaniAminabadi commented Aug 16, 2023

sakogan commented Aug 29, 2023

sakogan commented Aug 31, 2023

lekurile left a comment

Choose a reason for hiding this comment

sakogan commented Dec 11, 2023

mrwyattii Jan 3, 2024

Choose a reason for hiding this comment

sakogan Jan 5, 2024

Choose a reason for hiding this comment

kiucho commented Jul 7, 2024 • edited Loading

kiucho commented Jul 7, 2024 •

edited

Loading