-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD thread #3759
Comments
Why no AMD for Windows? |
@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux. |
Does someone has a working AutoGPTQ setup? Mine was really slow when I installed the wheel: https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl When building from source, the text generation is much faster but the output is just gibberish. I am running on a RX 6750 XT if this is important. |
Why exactly do models prefer a GPU instead of a CPU? Mine is running quick on CPU, but OBS kills it off due to OBS using so much. |
users prefer. Since: an AMD gpu comparable with 3090 may work at ~20t/s for 34B model. |
I have an AMD Radeon RX 5500 XT, is that good? |
I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT. I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:
|
same issue here. Still no solution for me. Anyone can gimme some light here? ty in advance. |
Okay, so this is definitely not idea but I found that VERY carefully following the manual installation guide and then uninstalling bitsandbytes makes it work. I'm still figuring things out but at least it works now. |
Then you installed that modified version of bitsandbytes for rocm? Or..? What exactly did you do? Tks in advance. |
I am not sure which version is newer, but I used https://github.com/agrocylo/bitsandbytes-rocm. git clone [email protected]:agrocylo/bitsandbytes-rocm.git
cd bitsandbytes-rocm/
export PATH=/opt/rocm/bin:$PATH #Add ROCm to $PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
make hip
python setup.py install Make sure the environment variables are also set, when you start the webui. Depending on your GPU you might need to change the GPU target or GFX Version. |
Saying it takes 6 seconds is not that helpful to get an Idea of the performance you have. Because that depends on the length of the output. Take a look at the console. After every generation it spits out the generation speed in With my RX 6750 XT I got about 35 t/s with a 7B GPTQ Model |
I have |
@RBNXI This is caused by
Yes it worked really good on my PC until I broke my Installation with an update of the repository. I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :) |
Cool, I'll be waiting for that then.
I saw it and tried to build it, but gave an error and got tired of trying stuff, I just thought "well, having to do so many steps and then having so many errors must mean it's just not ready yet...". But I could try another day when I have more time if I can fix that error, thanks. |
@RBNXI What Error did you get?
Yes I can understand that. The setup with NVIDIA is definitely easier. |
I don't remember the error, I'm sorry. But I had a question for when I try again, the command you used to clone (git clone [email protected]:agrocylo/bitsandbytes-rocm.git) I remember it gave me an error, is it ok to just clone with the default link to the repo? It said the link you used is private or something like that |
Yes you can of course use the link from the repo directly. You probably mean this one: https://github.com/agrocylo/bitsandbytes-rocm.git |
I tried again and same result. I followed the installation tutorial, everything works fine, then run and get the split error, then I compiled bitsandbytes from that repo (now it worked) and then tried to run again and same split error again... |
installing bitsandbytes-rocm is the only way I've been able to make this work. The new install doesn't seem to work for the 7900XTX |
AMD Setup Step-by-Step Guide (WIP)I finally got my setup working again (by reinstalling everything). Here is a step by step guide on how I got it running: I tested all steps on Manjaro but they should work on other Linux distros. I have no Idea how the steps can be transferred to windows. Please leave a comment if you have a solution for Windows.
Step 1: Install dependencies (should be similar to the one click installer except the last step)
If you get an error installing torch try running Step 2: Fix bitsandbytes
I found the following forks which should work for ROCm but got none of them working. If you find a working version please give some feedback.
Step 3: Install AutoGPTQ
If the installation fails try applying the patch provided by this article.
Step 4: Exllama
Step 4.5: ExllamaV2
If you get an error running ExllamaV2 try installing the nightly version of torch for ROCm5.6 (Should be released as stable version soon) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6 --force-reinstall Step 5: llama-cpp-python
You might need to add the I hope you can get it working with this guide :) I would appreciate some feedback on how this guide worked for you so we can create a complete and robust setup guide for AMD devices (and maybe even updated the one click installer based on the guide) Notes on 7xxx AMD GPUsRemember that you have to change the GFX Version for the envrionment variables: As described by this article you should make sure to install/setup ROCm without opencl as this might cause problems with hip. You also need to install the nighly version of torch for ROCm 5.6 instead of ROCm 5.4.2 (Should be released as stable version soon): pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6 |
@RBNXI What model are you using? Which loader are you using? Usually this error means the loader failed to load the model. As explained by my guide above you have to do extra steps for AutoGPTQ and Exllama/Exllama_HF. Also note that with AutoGPTQ you often have to define the |
Awesome guide, thanks, I'll try it when I can. I tried with different --n-gpu-layers and same result. Also, AutoGPTQ installation failed with
Edit 2: I tried running a GPTQ model anyways, and it starts to load in VRam so the GPU is detected, but fails with:
|
@RBNXI I found this Issue in the ROCm Repo discussing the RX 6600. According to this the RX 6600 should work. Usually for all 6xxx cards llama.cpp probably runs on CPU because the prebuild python package is only build with CPU support. This is why you need to install it with the command from my guide. Regarding AutoGPTQ: I think you just copied the last lines not the real error that broke the installation. Therefore I am not sure what the problem is. Maybe check your ROCm Version and change the I usually run the webui with |
I don't have rocminfo installed, should I?. But clinfo shows my GPU indeed. I'll try to reinstall again and see if it works now. I did install rocm-hip-sdk. And I'm using Arch. Also I'm running it in a miniconda environment, is that a problem? Also the ROCM I have installed is from arch repository, I think it's 5.6.0, is that a problem? if I change the version in the command (pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 -> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6.0) it says ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none) |
I'm trying to install, still errors everywhere. First of all the bitsandbytes installation fails, so I have to use the pip one.
What am I doing wrong? I'm following the guide... this is so frustrating... could it be that I have to install ROCM 5.4.2 from some rare repository or compile it myself or something obscure like that? It says pytorch is installed without ROCM support? even if I installed it with Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support... |
Did anyone figure this out yet to get it AMD working with recent builds of textgen and stuff like llama 3.1 gguf? |
The only way I've managed it is by building my own I am slowly making some build scripts that can handle the weird requirements of webui, and may at some point publish these. But it's early days on that front. |
Thought I'd posted this, but don't see it in searching, so I'll post again... I wasn't ever able to use the built in scripts to satisfaction. |
I have tried manually building this way, and while it builds successfully, the resulting llama-cpp-python package seems to only run on CPU regardless. Not sure if related, but installing this way, I encounter an error when trying to load it (something to do with llama-cpp-python-cuda vs non-cuda package name). My fix has been to tweak the file
I don't know if this is the right way to do this, as I'm not 100% sure how the error is caused in the first place. |
AMD support seems broken for me on Linux, with ROCm 6.0 in At first the installation was successfully with the AMD option. When tried to do inference with a model loaded to the GPU, It was constantly crashing with the following error:
Looks like Although running
( Could it be that the wheel was compiled with cuda parameters? ) Weird, and it loads well into the GPU, just crashes on inference. After some digging,
Approximately like that:
Not sure about the I think we could improve this repo by giving the user an option to either pull pre-built wheel, or build Also, Thinking we could install the ROCm wheel under a different python package name: Hope it helps someone :) |
Some additional info on building Here is a small script that might be useful to the people here. It pulls down |
@dgdguk Your script worked fine on Arch. I had the conda environment enabled but I'm not sure if it's necessary. The conda env created by the start script uses Python 3.11, so my only suggestion is to replace |
@feffy380 Thanks for the feedback. I've incorporated that change. The script builds for whatever version of Python is running it, so building under a Conda environment gets you compatibility with that Conda environment. That being said, I had weird issues getting it to build within a |
@dgdguk @codeswhite - thanks for your scripts! For some reason on Arch OS (git version 2.45.2), the git command for me is
|
I was able to get this working in Windows by following these steps:
The reason for the |
You're right! My typo. I edited it to avoid future confusions. |
Unfortunately, macOS does not support the /dev/kfd device, and you cannot directly access AMD GPUs in Docker on macOS. The best approach is to use a Linux environment or a cloud service that supports GPU access. If you have further questions or need assistance with a specific setup, feel free to ask! |
@FoxMcloud5655 Not sure if you're still around, but can you confirm that adding the One other alternative is for Windows users to use WSL. |
I am trying to install text-gen-webui on windows but I do not understand what script you @dgdguk are trying to update and is it possible to share with me so I can also try it on windows 10. Please guide me. |
The llama-cpp-python workflow for AMD is broken again, does anyone know what causes this error? https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions/runs/11079365489/job/30788288861
|
A quick search says that this issue was fixed in ROCm 6.0.2. I see that this build uses 5.7.1, which technically shouldn't be an issue... But I was personally using 6.1 with no issues once the environment was set up on Windows 11. |
The target isn't required, as far as I know. I used it to make sure the wheel was as targeted to my own system and as small of a footprint as possible. Not even sure if it helped or not. However, I can confirm that the HIPBLAS argument is required. A seemingly unrelated error appears if you don't include it. |
Currently the project uses rocm 5.6.1, and the error also happened with 5.7.1. It seems like most Linux distributions ship 5.7.1 by default, so I'm not sure if upgrading would be reasonable. |
the error I found here was from splitting the batch. CMAKE_ARGS="/opt/rocm/llvm/bin/clang HIP_PATH=/opt/rocm -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DGGML_HIPBLAS=on -DCMAKE_C_FLAGS='-march=native -mavx2' -DCMAKE_CXX_FLAGS='-march=native -mavx2'" FORCE_CMAKE=1 CMAKE_BUILD_PARALLEL_LEVEL=16 pip install llama-cpp-python==0.2.90 |
@oobabooga AS others have said, the issue is likely due to the outdated version of ROCM that you are using. One thing I don't quite get: it's pretty clear one of the "features" you have invested a lot of effort is following upstream project builds to a frankly unreasonable level, building almost every new commit, but you don't seem to apply that zeal to the frameworks. The most recent frameworks you're building against are ROCM 5.6.1 (May 2023) and CUDA 12.2 (June 2023). That's more than a years worth of features and bug-fixes you're leaving on the table. Realistically, I think you should probably cull some of the older frameworks that you're building against so that you can build against newer targets. There's seriously no reason to build for every minor version of CUDA i.e. 12.0, 12.1 and 12.2, or 11.6, 11.7 and 11.8. Just build 12.2 and possibly 11.8 if there were hardware deprecations in CUDA 12. The same is true for ROCM (more so, really, as ROCM has been having a lot more feature enablement than CUDA). |
If you see a improvement to the wheels, send a PR to the repositories where they are compiled. https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels |
My point was more along the lines that it's not just AMD, which you have repeatedly disavowed interest in supporting due to lack of hardware. That's I can understand. I still think you would be best served not building any AMD wheels at all, however, given that they are frequently non-functional and you lack the resources to fix them. What's not understandable is leaving CUDA on an old version while targeting bleeding edge code for your builds of binaries. That's your wheelhouse. I've said this before: text-generation-webui is currently trying to be two separate things. It's both a user interface component, and a distribution of binaries of upstream stuff to run that user interface. That second component gets in the way of anyone else stepping up to provide binaries for hardware that you do not support, and given how you are maintaining the CUDA side, I'd argue it's not exactly well maintained anyway. And just to be clear on something: you cannot go "someone submit a PR" and then merge the code without testing - which you have explicitly said you cannot do for ROCM. Anyone could submit a PR that builds a malicious package, which you would then release under your own name, completely in ignorance. This would not be a good thing. If you're accepting PRs, then you or the project needs a way of testing. |
I second what @dgdguk said above. Don't get me wrong; I'm extremely appreciative of the work you've done so far. But like they said, I would much rather you refuse to support something entirely, simply providing build instructions, than provide non-working binaries. There are many a number of users who might even be happy to donate an AMD card for testing or even provide funds explicitly for purchasing an AMD card for you to test with, me included. But the last thing I want is for bad practices to be followed. Open source code is a great thing when these practices are followed; let's keep it that way. |
@alexmazaltov Apologies for not seeing your comment earlier. The script that's been mentioned is https://gist.github.com/dgdguk/95ac9ef99ef17c44c7e44a3b150c92b4. However, I'm currently seeing some errors when testing with current @oobabooga On rereading my earlier I realise that I might have sounded a bit more negative than I intended. Similar to @FoxMcloud5655, I do appreciate the work that you've done. However, right now you do not have the capacity to maintain ROCM functionality, and some decisions of |
It's probably poorly written, but if you're having trouble with llama.cpp try what I wrote. |
I have updated all wheels to ROCm 6.1.2. As I said, if you can see an improvement to the wheels (like updating the ROCm version), just PR the repositories above. Usually only a small change is needed. |
@LeonardoSidney No, this is a new issue that occurs with @oobabooga As noted above, while I will check when I get back to my main rig in a few days, given the current issues I've observed when using current 'llama.cpp-python', it is quite likely that wheels built with the current source code do not actually function for generating text. I reiterate: if you don't have the ability to test your code, you cannot maintain it. It doesn't matter how small the change is; Just because things build, it doesn't mean they work - and I know a lack of testing has bitten this project in the past. |
I managed to get remote access to my main PC earlier than thought (was having Internet issues, but they spontaneously fixed themselves). Current status of @oobabooga ROCM builds: they load, but on running any text generation I get the following hard crash on an AMD RX7900XT:
@oobabooga This might be an upstream bug, but again: if you're not testing your releases, don't release them. As you cannot test the ROCM distribution at the moment, it may be prudent to not release them. Also I think there may also be issues with the CPU builds, because for regenerating text produced the same output in at least some cases, which didn't seem right. Editing to add: The crash does not appear to be an upstream issue; building locally works at the moment. The regenerating text bug appears to be in upstream. |
this is what i got on amude hardware |
@VasiliyMooduckovich Yeah, that's about par for the course at the moment. Oobabooga's not able to test the builds (as well as there just not seeming to be any automated testing at all for any of the automated builds, which is kind-of wild), so things being broken is expected. As far as I can tell, your best bet at a functioning |
This thread is dedicated to discussing the setup of the webui on AMD GPUs.
You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.
The text was updated successfully, but these errors were encountered: