Separating LLM model GPU from TTS model GPU #6068

Mikec78660 · 2024-05-29T20:35:44Z

Mikec78660
May 29, 2024

I have several Nvidia graphics cards. I can run the webui on any combination of cards no problem by setting:

export CUDA_VISIBLE_DEVICES=0,1,2,3

But that sets the same GPUs for all tasks. I can put for example:

export CUDA_VISIBLE_DEVICES=0

in my startup_linux.sh file as the first line and the webui will start up and load everything on GPU 0. but now I want to load alltalk_tts on a seperate GPU. But if I put for example export CUDA_VISIBLE_DEVICES=1 in launch.sh like:

#!/bin/sh
export CUDA_VISIBLE_DEVICES=3
python modeldownload.py
uvicorn tts_server:app --host 0.0.0.0 --port 7851 --workers 1 --proxy-headers &
sleep 5
python script.py

it still just uses GPU 0 to load and run the TTS model.

Does anyone know how I can use one set of GPU for the main LLM and another for the TTS and STT models? I think I can run alltalk in a docker container but didn't want to have to go down that road as I finally for it running as an extension inside oobabooga.

TheLounger · 2024-06-06T23:09:31Z

TheLounger
Jun 6, 2024

This sounds like a question for alltalk_tts.
It's odd that alltalk doesn't pick up CUDA_VISIBLE_DEVICES, however, you could try to hard-code it (just to see it if works).

Basically change everything that says device = "cuda" to device = "cuda:1", where 1 is the device index:
tts_server.py#L37
tts_server.py#L350
tts_server.py#L391

More info
https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics

0 replies

leomaxwell973 · 2024-06-07T07:52:44Z

leomaxwell973
Jun 7, 2024

Yea, its not 0,1,2,3,4, I.E. 1 means 0-1. IIRC. i haven't done SLI in a long time, but, thats how i understood the env variable.

My advice would be, since AllTalk respects the "CUDA_VISIBLE_DEVICES=x" env var, is to:
1 - (*assuming that the main text gen will assign cuda devices first) - Have all of your CUDA devices being active at the max index, MAX: set CUDA_VISIBLE_DEVICES=x that is.

2- Go to the script.py for alltalk and assign a lower desired CUDA index, for 1 card, use 0, 2=1, and so on.

3- do so for any other extensions desire to segregate CUDA

I.E. Server.py/One-click would have set CUDA_VISIBLE_DEVICES=3 and AllTalk's script.py would have set CUDA_VISIBLE_DEVICES=0 near the top before hooking to CUDA.

There may be some cross talk but, I.E. in the above, Main webui Would dip into alltalks CUDA, but, it's close. One would hope it would get the idea of what the user is trying to do but meh, thats hoping.

Edit: oh yeah, don't' forget, to remove set CUDA_VISIBLE_DEVICES=x from global env altogether or separate from your env completely, as of course, a globally visible instance would make this method do nothing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separating LLM model GPU from TTS model GPU #6068

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Separating LLM model GPU from TTS model GPU #6068

Mikec78660 May 29, 2024

Replies: 2 comments

TheLounger Jun 6, 2024

leomaxwell973 Jun 7, 2024

Mikec78660
May 29, 2024

TheLounger
Jun 6, 2024

leomaxwell973
Jun 7, 2024