-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radeon VII #175
Comments
Hi, that would be really interesting! Most of the rocm components seems to had by default the support for in code for the Radeon VII/gfx906 in place by default and couple of weeks ago I went through all the common places typically requiring patching. I have tested that everything should now build for these cards but not been able to test functionality with VII. That said, if you have time it would be great if you could try to make the build and test it. These steps should help you to get started.
And once the build has been progressed to building rocminfo and amd-smi, those command would be good way to start checking the build.
Hip and opencl -compiler tests should be also doable pretty soon (no need to wait whole build to finish)
Once the build has finished, if things works well, then also pytorch should have the support for you gpu.
If these works, then you can also build the llama_cpp, stable-diffusion-webui and vllm with command: ./babs.sh -b binfo/extra/ai_tools.blist All of those have also own example apps you can run either on console or by starting their web-server and then connecting to it via browser. (I can help more later if needed) |
Thanks :-) I'll try that asap |
Hello @lamikr , Thank you for your amazing work! I am really glad I found this repo. I have two AMD MI60 cards (gfx906). I will also compile this repo and share test results with you! I am specifically interested in VLLM batch/concurrent inference speeds. So far, I was not able to compile VLLM with default installations of ROCM 6.2.2 and VLLM. There is also composable_kernel based flash attention implementation here - https://github.com/ROCm/flash-attention (v2.6.3). This FA compiles fine with default ROCM 6.2.2 in Ubuntu 22.04 but exllamav2 backend with llama3 8B started generating gibberish text (exllamav2 works fine without FA2; but exllamav2 is very slow without FA2). I hope this repo fixes this gibberish text generation problem with FA2. Thanks again! |
Quick update. I did a fresh installation of Ubuntu 24.04.1 today which takes around 6.5GB SSD storage. It installs Nvidia GPU drivers by default. I assumed this repo would install AMD GPU drivers but no, it did not. Probably, this should be mentioning in README with a brief description of how to install GPU drivers. So, I installed AMD GPU drivers as follows:
Also, there were several packages missing in Ubuntu which I had to install after I saw error messages in ./install_deps.sh.
Only after that, I was able to run ./install_deps.sh without errors. Another feedback. Can you please include a global progress bar that says how many packages were built and the total number of packages remaining in terminal logs? |
ok, I want to report an error that occurred while building the source code.
Attaching the full error output Short info about my PC: I ran the following commands and they worked.
rocminfo correctly showed those two MI60 cards. hipcc and opencl examples worked without errors. Please, let me know if I need to install Cuda libraries or else, how I fix the error above. Thanks! |
@lamikr , I think the error I am seeing might be related to spack/spack#45411 but not sure how I implement the fix here. Let me know. thanks! |
Quick update. Installation is working after I remove all nvidia drivers and restart my PC.
Now, Ubuntu is using X.Org Server Nouveau drivers. |
Finally, ROCM SDK was installed on my PC after 5 hours. It takes ~90GB of space in rocm_sdk_builder, 8.5GB in the triton folder, ~2GB in the lib/x86_64-linux-gnu folder (mostly LLVM) and ~20GB in opt/rocm_sdk_612 folder. Total of 120GB of files! Is there a way to create an installable version of my current setup (all 120GB)? It is huge and time-consuming. For comparison, rocm installation from binaries takes around 30GB. |
here are the benchmark results. I think the flash attention test failed.
|
that error above is causing llama.cpp not to run any models on GPU. Let me file a bug. |
@lamikr finally got round to do the testing initially the build went smooth-ish after doing a HIP_COMPILER=clang |
Hi. thanks for the reports. The flash attention support for gfx906 would need to be implemented in aotriton. Althought I do not have the gfx906, I will start a new build for it with ubuntu 24.04 and try to reproduce the build errors. If you have some fixes, are you able to make pull request? |
hey @lamikr The build is on LinuxMint Debian Edition, if need be i can make pull requests |
I have multiple versions of it under src_projects directory
I am not sure what is causing it. Maybe the install directory /opt/rocm_sdk_612 should also be removed and then start a clean build. Lets try to reset everything and then start a fresh build.
I have not solved yet the llama.cpp error with gfx906 but trying to add more debug to next build related to that. |
I can get as far as running the HIP and CL hello worlds, but cannot run the run and save benchmarks script. -- MIGraphX is using hipRTC
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
CMake Error in src/py/CMakeLists.txt:
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
CMake Warning (dev) at /opt/rocm_sdk_612/share/rocmcmakebuildtools/cmake/ROCMTest.cmake:230 (install): RPATH entries for target 'test_verify' will not be escaped in the -- Generating done (2.5s) |
I have now done couple of docker images from the quite new rocm_sdk_builder for different gpu-architectures.
Docker image for cdna-cards supports gfx906 and works at least with MI50, so I believe it could also work with Vega VII. The problem is that even with xz-compressed the size of these images is 6GB. (and as uncompessed it is about 50gb) These images have every application included from rocm sdk builder. Do you know is it possible to upload xz compressed docker images to docker hub? |
@Said-Akbar, @cb88 and @commandline-be Here are the commands I used to import and run it: (tmpdir was needed because image is so big that to import would fail if reqular /tmp memory dir is used)
Now I would need some location where to upload it. |
Hello @lamikr , I checked docker hub and they have a limit of 2GB. Since it is not enough for this container, I can suggest that you create a compressed file and upload it to a google drive. Each new gmail account will have 15GB of free storage. Then you can make the folder public and share with us. Thanks! |
Hi @Said-Akbar, I did what you suggested :-) https://drive.google.com/drive/folders/1XnoSvL41XhrKT_5NbBSrUZ_1LaVpQ-xb Let me know how it work. I put there also a file with some instructions as at least I needed to change the TMPDIR location during import time to avoid running out of memory. |
Thank you! I will try it tomorrow. |
GitHub offers a free docker registry for public repos. The size limit is 10GiB per layer. |
Do you have experience for doing docker builds on github and what does the layer mean here? I would think that the end-image size is at least 10-50 gb uncompressed and as compressed it would be about 2-6 gb. At the moment I am doing the build by using files in folder
I have splitted there the Dockerfile to multiple parts because I had some kind of image merge error in the end, when I just tried to build too many projects in one step. At the moment I will first put everything in comment on Dockerfile after line `
and then I will call for example a script:
Once that step finishes and image is saved, I uncomment the line asking to build the second set: and then launch the ./build_rocm_sdk_container_rdna3.sh script again. Maybe there is some better/smarter way of doing the commits between RUN commands that I am missing. Something similar than calling "podman tag " directly from the Dockerfile between RUN commands. Or alternatively there could maybe be a script that adds those "RUN commands" dynamically to Dockerfile after each succesful build phase. That would be nice to try on github. |
Most (if not all) Dockerfile instructions result in a new layer to be created. You may examine layers of existing images using
I guess the size limit is applied to the compressed size of each layer, but I am not very sure.
That's a bad practice even if it succeeds in merging since a massive image layer is very hard to upload/download. Splitting build steps into multiple
Use the layer id (hash) in FROM 1234567890ab
# ... remaining steps ... and build it.
For example:
FROM ubuntu:24.04
# ... omitted ...
WORKDIR /
RUN apt update && apt install -y git git-lfs sudo vim
RUN git clone --recursive https://github.com/lamikr/rocm_sdk_builder.git
WORKDIR /rocm_sdk_builder
RUN git checkout master
RUN ./docs/notes/containers/scripts/preconfig_ubuntu.sh
RUN ./docs/notes/containers/scripts/select_target_gpus.sh ${ENV_VAR__TARGET_GPU_CFG_FILE_ALL}
I noticed that your Dockerfile does some cleanup jobs in the final layers. It is useless as all deleted files will remain in the previous layers (each layer, once created, is immutable). |
Thanks for the suggestions... So basically you suggested that the Dockerfile itself has only a commands for creating the very base image and then all other "docker run" commands following are called from the shell script instead of adding them to Dockerfile itself. I did not realize that it can be done that way. I had thought that I need to modify the Dockerfile itself dynamically from the script between each build step by using "echo "MY command" >> Dockerfile". Error-check needs to be anyway added between each step so that the script will stop, if some of the steps failed. Reason for the cleanup task in the end is that in that way the exported image created with "podman export" will be smaller. That one will contain only the files that are in the image on final step. The image that is now shared in gmail fileshare is done in that way. In that way I was able to reduce the size of the exported image for tens of gigs. User of the docker image can get these files anyway back if he want by using babs.sh commands. It even allows updating and rebuilding it partially with commands like:
|
The FROM ubuntu:24.04 AS builder
# ... all build steps ...
FROM scratch
COPY --from=builder / /
# Now you get a single-layer image That being said, the best practice to build a Docker image usually follows these practices:
If you'd like to setup GitHub Actions to build Docker images, there is a limit that "each job in a workflow can run for up to 6 hours of execution time". To work around the limit, splitting the Dockerfile is needed anyway in order that a build workflow can be split into multiple jobs - several jobs build some groups of dependencies, and they are finally aggregated in jobs that build the last monsters (pytorch, etc). This strategy has some other advantages, e.g. if two or more components do not depend on each other, they can be built in different jobs (thus on different GHA runners) simultaneously, cutting the total build time down greatly. The current build process of the project follows a linear dependency chain (components are built one by one). Could you make a (rough) dependency graph among all components built by this project? Using such information, building some components simultaneously would be possible and I am willing to help to write proper Dockerfile and GHA workflows. |
Thanks for this guide. Sadly I've not managed to push beyond the 'build failed: roctracer' situation. This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x322) This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x360) I'm building this on Debian which is not throwing any compatibility issue afaik but does fail to build. It seems also the build script is not aligned with the actual code tree. in that i do not find env_rocm.sh in /opt/rocm/bin for example but i do find it under ./binfo/env etc. Please assist
|
After running babs -rs The result was the same, the file env_rocm.sh was not found in /opt/rocm/bin before or after |
@commandline-be It should be by default in
(not in /opt/rocm/bin/env_rocm.sh, as /opt/rocm folder is usually used by the AMD's own rocm builds) Can you check whether you have /opt/rocm_sdk_612/bin/env_rocm.sh? If yes, then you should also find some example apps to test on. For example:
If not, let's try to do with smaller steps to find out what is the problem. The env-variable script should be installed alredy by the first package, so we can try to build only that one.
After these commands you should have that script installed and only a couple of other files. |
@Rongronggg9 Thanks for the feedback. I will try to do the Dockerfile and github-actions for github build now... I was thinking something like this for the base-image that would be run on first command. It should stay under time and space limit for single layer.
And then for github-actions, I could first try with something like this to build the base image first with Dockerfile and then run single action to build llvm with a separate command after that to create a second layer.
|
It took couple of iterations, but action and dockerfile in previous commit seems to now work to run the dockerfile.
But I am getting error:
|
Needed to add -bash -c... So this seems to work and llvm is now building. ` jobs: Could I somehow now continue from the image created by this action by creating another action? Or do I just need to add all run: docker run commands to this one? |
Refer to https://github.com/Rongronggg9/wps-office-repack/blob/main/.github/workflows/repack.yml for a live example of a multi-job workflow. Using You need to upload the Docker image built in each job somehow and download it in the next job. In my example, the "fetch" job is the prerequisite of the jobs in the "repack" matrix. The bridge connecting the these jobs is the "cache" step - each public GitHub repo has a 10GiB GHA cache storage. The "cache" step in the "fetch" uploads files to the GHA cache storage, and then the "cache" step in each job in the "repack" matrix downloads the uploaded files. That being said, this methodology may not be the best one for your GHA workflow. Docker has a feature called inline cache, which embeds build cache into the pushed image with almost no impact on the image size. Thus, you can just push the intermediate image to GitHub packages in each job and pull it in the next jobs.
Doing this only addes steps to the same job. |
@lamikr tried all that not few times already, now i tried the approach you suggested, same issue To my understanding the fail originates from line 156 in MatrixTranspose.c which throws an error on clang: commenting out line 156 does not fix anything on retrying |
Yes, what do you want to be done with the pull requests ? |
@commandline-be Sorry I have forgot to mention that I was able to reproduce the build error on roctracer with backtrace.h on ubuntu in some situations. It has been fixed since January 12.
That should make sure that everything has been cleaned from previous build and new build is started from crash. |
Just in case this matters to someone, when using the gfx906 in a Linux VM it suffers from a known bug. This requires to compile a module for Linux named vendor-reset. It works well for the Radeon VII. you can find the module here, it builds with dkms also |
@commandline-be Fingers crossed that the build will now work for you with Vega VII. So you are doing the build on Linux VM? |
building now, pending results |
@Rongronggg9 The discussion related to docker image kind of sidetracked the original Radeon VII issue, so I created a new issue for the docker image building by using the github actions on I hope you could help me there to fix the current action file which fails on upload phase. |
Yeah, that would be great, so i can make real use of this card eventually. I barely used it but for some AI / ML tasks it seems well suited still. |
@lamikr sorry to say, some difference but same outcome. [100%] Built target file_plugin ---- the commands i used from ./src/rocm_sdk_builders/ sudo rm -rf /opt/rocm_sdk_612 I'm building on a recent Debian which is somewhat like Ubuntu but not entirely. ==== LARGER CONTEXT [ 82%] Linking HIP executable MatrixTranspose |
workaround to solve build problem on debian until better fix is available. #175 Signed-off-by: Mika Laitio <[email protected]>
@commandline-be As a workaround, I disabled the building of roctraces tests for now by default. Can you try to update the sources and try again with these commands to get updated roctracer and then try to build it again.
|
@Said-Akbar Hi, Have you had time to test the docker image? |
could this be because I'm not building as root ? thus far I've not had notable issues with functionalities, at one point I played 3D games and benchmarks in the VM |
Sorry, I had error on my fix. Pushed new change again... Now the ./babs.sh -up should work... |
no worries, your work is much appreciated. I've erased the entire rocm_sdk_builder folder and started over from scratch. |
Progress it seems ... [ 0%] Building C object external/llvm-project/llvm/lib/Support/BLAKE3/CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o and eventually [ 0%] Built target MLIRTableGen |
After: apt install miopen-hip-dev and then babs -b again I now get Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/circular_raw_ostream.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal". and also after: apt install llvm-dev make[2]: Nothing to be done for 'external/llvm-project/llvm/tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/build'. |
You should not need to install the rocm deb files, those are most likely messing the build somehow. That's because I see in AmdDeviceLibsIncGen.py file that was in your error message a following on line 25: def generate(outputPath: Path, rocmPath: Path, libs: List[str]) -> None: Can you print the output of following commands:
|
Thanks for the informative feedback.
should I consider running a |
I've now removed anything related to rocm installed on the OS by the package manager. The build now appears to continue. Thanks. I assumed the build process would not ingest anything from the OS for building. |
After a long build it now fails at pytorch. I report the output below, should this matter. I'm restarting the entire build after a clean (-rs) cc1plus: all warnings being treated as errors --- before this FAILED is reported 41/2619] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o |
Thanks, this was good to information. Hard to know what exactly caused that. Maybe there were some libraries or header files under /usr directory that confused it. I wished it would have been some environment variable like ROCM_HOME because that I could have been able to solve/fix more easily by re-dedining it in envsetup.sh that babs uses. Any changes you could check which deb files you removed? (history command) |
@commandline-be If you are still seeing the same error, could you try to replace the line "unset CFLAGS"
from src_projects/pytorch/build_rocm.sh and then run
to see whether pytorch would now build ok. |
owner of a Radeon VII card, if i can help testing code to run well on it, let me know
The text was updated successfully, but these errors were encountered: