GQA for smaller models #635

Dampfinchen · 2023-08-07T13:27:05Z

Hello,

could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this would really help them to use a larger context. A context of 4096 just needs too much memory to be feasible right now with good speed and quality on most common hardware.

Thank you!

minowau · 2024-07-13T19:15:15Z

I have tried to solve it with request token #1139
Please check it

WuhanMonkey added the new-feature New feature or request label Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GQA for smaller models #635

GQA for smaller models #635

Dampfinchen commented Aug 7, 2023 •

edited

Loading

minowau commented Jul 13, 2024

GQA for smaller models #635

GQA for smaller models #635

Comments

Dampfinchen commented Aug 7, 2023 • edited Loading

minowau commented Jul 13, 2024

Dampfinchen commented Aug 7, 2023 •

edited

Loading