Best way to run GPT4-x-Alpaca-13B on 3060TI / 16GB ram? #749

Nightnightlight · 2023-04-03T19:24:10Z

Nightnightlight
Apr 3, 2023

I can just barely run it with default settings from the installer at 1.09 tokens a second. Anyway to perhaps improve those speeds or is that the best my setup is gonna get?

knoopx · 2023-04-04T16:55:03Z

knoopx
Apr 4, 2023

you sure you're using GPU inference? llama.cpp-based (ggml) models run on CPU.

2 replies

Nightnightlight Apr 4, 2023
Author

Apparently not looking at performance tab even though I selected the nvidia option on the installer. Tried the --pre_layer 30 option as well and it just runs out of memory then mid generation.

colaborat0r Apr 11, 2023

How much VRAM?
My 3060 is 12GB VRAM, 32GB DDR3 ram.
This model is the best performance I've gotten so far.

flesnuk · 2023-04-04T17:05:39Z

flesnuk
Apr 4, 2023

using --wbits 4 --groupsize 128 --pre_layer 30 --auto-launch with this 4bit model https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
I get 0.4-0.6 tok/s with your exact same gpu and ram

1 reply

devSarry Apr 8, 2023

My results with the same setting 3060ti 32GB ram
Output generated in 60.60 seconds (1.24 tokens/s, 75 tokens, context 49)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to run GPT4-x-Alpaca-13B on 3060TI / 16GB ram? #749

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best way to run GPT4-x-Alpaca-13B on 3060TI / 16GB ram? #749

Nightnightlight Apr 3, 2023

Replies: 2 comments · 3 replies

knoopx Apr 4, 2023

Nightnightlight Apr 4, 2023 Author

colaborat0r Apr 11, 2023

flesnuk Apr 4, 2023

devSarry Apr 8, 2023

Nightnightlight
Apr 3, 2023

Replies: 2 comments 3 replies

knoopx
Apr 4, 2023

Nightnightlight Apr 4, 2023
Author

flesnuk
Apr 4, 2023