[Model] Add use_qk_norm option for Cohere model #2877

tlopex · 2024-09-02T17:56:50Z

This pr updates use_qk_norm option for Cohere series models like Command-R-Plus.

tlopex · 2024-09-02T18:09:00Z

cc @MasterJH5574
Honestly, I don't know the way that split-and-combine qkv like this is good or not. And since the Command-R-Plus is too large that my device cannot hold it, it will be extremely helpful for me if you can give me some suggestions and test with it.

MasterJH5574 · 2024-09-09T21:09:16Z

@tlopex Sorry for the delayed response. Split-and-combine is okay for now. Could you try the 4b quantized version on your end? If that's not possible I can find a way to test on my side.

tlopex · 2024-09-10T14:08:52Z

@MasterJH5574
My device is equipped with 16GB of GPU memory, and after applying INT4 quantization, the 55.1B c4ai-command-r-plus-4bit model is estimated to still require around 27.5GB of memory, I assume that it will be difficult for me to test this on my side. So I may need your assistance.

tlopex · 2024-10-09T14:46:34Z

@MasterJH5574 Hi! Could you find a way to test this, so we can determine if the PR is ready for merging?

tlopex added 4 commits September 3, 2024 01:51

Update cohere_model.py

fd88ddd

fix lint

5247ca4

fix lint

aeb0be3

fix lint

892ae6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add use_qk_norm option for Cohere model #2877

[Model] Add use_qk_norm option for Cohere model #2877

tlopex commented Sep 2, 2024

tlopex commented Sep 2, 2024 •

edited

Loading

MasterJH5574 commented Sep 9, 2024

tlopex commented Sep 10, 2024 •

edited

Loading

tlopex commented Oct 9, 2024

[Model] Add use_qk_norm option for Cohere model #2877

Are you sure you want to change the base?

[Model] Add use_qk_norm option for Cohere model #2877

Conversation

tlopex commented Sep 2, 2024

tlopex commented Sep 2, 2024 • edited Loading

MasterJH5574 commented Sep 9, 2024

tlopex commented Sep 10, 2024 • edited Loading

tlopex commented Oct 9, 2024

tlopex commented Sep 2, 2024 •

edited

Loading

tlopex commented Sep 10, 2024 •

edited

Loading