Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add use_qk_norm option for Cohere model #2877

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tlopex
Copy link
Contributor

@tlopex tlopex commented Sep 2, 2024

This pr updates use_qk_norm option for Cohere series models like Command-R-Plus.

@tlopex
Copy link
Contributor Author

tlopex commented Sep 2, 2024

cc @MasterJH5574
Honestly, I don't know the way that split-and-combine qkv like this is good or not. And since the Command-R-Plus is too large that my device cannot hold it, it will be extremely helpful for me if you can give me some suggestions and test with it.

@MasterJH5574
Copy link
Member

@tlopex Sorry for the delayed response. Split-and-combine is okay for now. Could you try the 4b quantized version on your end? If that's not possible I can find a way to test on my side.

@tlopex
Copy link
Contributor Author

tlopex commented Sep 10, 2024

@MasterJH5574
My device is equipped with 16GB of GPU memory, and after applying INT4 quantization, the 55.1B c4ai-command-r-plus-4bit model is estimated to still require around 27.5GB of memory, I assume that it will be difficult for me to test this on my side. So I may need your assistance.

@tlopex
Copy link
Contributor Author

tlopex commented Oct 9, 2024

@MasterJH5574 Hi! Could you find a way to test this, so we can determine if the PR is ready for merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants