Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GQA for smaller models #635

Open
Dampfinchen opened this issue Aug 7, 2023 · 1 comment
Open

GQA for smaller models #635

Dampfinchen opened this issue Aug 7, 2023 · 1 comment
Labels
new-feature New feature or request

Comments

@Dampfinchen
Copy link

Dampfinchen commented Aug 7, 2023

Hello,

could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this would really help them to use a larger context. A context of 4096 just needs too much memory to be feasible right now with good speed and quality on most common hardware.

Thank you!

@WuhanMonkey WuhanMonkey added the new-feature New feature or request label Sep 5, 2023
@minowau
Copy link

minowau commented Jul 13, 2024

I have tried to solve it with request token #1139
Please check it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants