You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Issue
Apologies, I am in no means an expert, and I am still learning.
Recently, after upgrading KoboldCPP I have been seeing some strange repetition and other issues with responses that frequently just dont quite make sense given the content.
I am well within the context limit. (Currently testing Qwen 2.5 - 32B using Q4 quantization), it the responses frequently contradict what is in the context right before it. I was doing some digging, so I noticed that in the start output I see:
llm_load_print_meta: model ftype = Q3_K - Large
even though this is a Q4 model. Being curious, I downloaded llamacpp and did a similar execution, and it immediately detected the correct Q4 on the same model.
Additional Information:
Running in Podman, using a Quadro P6000, 24GB VRAM.
Seems to work otherwise, no errors or anything visible. Just curious if someone new of a magic flag or something to try? Or maybe I am being stupid and missing something.
The text was updated successfully, but these errors were encountered:
Fair enough! Thank you for the response. I am still exploring the various flags and options as I figure out the issues. I am pretty sure this is user error somehow...
Describe the Issue
Apologies, I am in no means an expert, and I am still learning.
Recently, after upgrading KoboldCPP I have been seeing some strange repetition and other issues with responses that frequently just dont quite make sense given the content.
I am well within the context limit. (Currently testing Qwen 2.5 - 32B using Q4 quantization), it the responses frequently contradict what is in the context right before it. I was doing some digging, so I noticed that in the start output I see:
even though this is a Q4 model. Being curious, I downloaded llamacpp and did a similar execution, and it immediately detected the correct Q4 on the same model.
Additional Information:
Running in Podman, using a Quadro P6000, 24GB VRAM.
For running koboldcpp I am using:
Seems to work otherwise, no errors or anything visible. Just curious if someone new of a magic flag or something to try? Or maybe I am being stupid and missing something.
The text was updated successfully, but these errors were encountered: