Trying to speed up responses. #5784
Replies: 2 comments 1 reply
-
One of the things to keep in mind is, the bigger the context, the prompt essentially, the more it gotta process when generating new text. If you don't need it to remember stuff earlier in the conversation, there's a setting in the parameters tab to truncate the prompt that will crop off the top once reaches that size and keep it that way. Keep in mind, depending on the system you're using, there maybe some special stuff inserted at the beginning that might not get reinserted and that could lead to the model forgetting some important instructions or whatever (sometimes you will not notice it because the AI might recognize the pattern in what remains of the text and go along with; but even then it is possible it might start drifting away over time). |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I am still very new to this, and I am playing around with different models and etc, but I have noticed that my responses are slow and getting slower.
Output generated in 424.34 seconds (0.12 tokens/s, 49 tokens, context 2379, seed 1092770479)
Output generated in 479.82 seconds (0.24 tokens/s, 116 tokens, context 2520, seed 1564826966)
What can I do to speed this up, I am currently using TheBloke_LLaMA2-13B-Tiefighter-AWQ.
any help or insight is appreciated, if you need anything else in terms of gear, I am using a RTX 3070 AND 12TH Gen i7-12700K (20 Cores)
Beta Was this translation helpful? Give feedback.
All reactions