-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gfx906 (AMD MI60) is failing on run_and_save_benchmarks.sh and llama.cpp #180
Comments
@lamikr , But I am not sure how to fix my issue above. Please, let me know if you have time to review this today. |
Hi, unfortunately I do not have myself the gfx906 for debug, so I only added added some patches that would be needed at least to get it build and start testing and added it's support as an experimental. About your error, I have not never seen that kind of error, but it could be some kind of misconfiguration in rocBLAS related to src_projects/rocBLAS/library/src/blas3/Tensile/Logic/asm_full/vega10/vega10_Cijk_Alik_Bljk_HB_GB.yaml But let's try to check first couple of basic issues step by step so I get basic info.
/opt/rocm_sdk_612/docs/examples/hipcc/hello_world |
Hello @lamikr,
|
tests:
|
Opencl test:
|
by the way, gfx906 has 'Vega 20' GPUs, but not 'Vega 10' GPUs. Not sure if some instruction that does not exist in gfx906 is being called from llama.cpp. |
Here is the app crash log :
|
Based on app crash logs, I see that rocm is not able to find the symbol table 'No symbol table info available.' Not sure what that means. Let me know. Thanks! |
Thanks, good to see that the the basic applications works. I will start my gfx906 build and try to check if I can figure out fix for those build errors with llama.cpp. |
Thank you! Looking forward to your updates. |
Hi, I added some more trace to clr component that is responsible for loading the so-files that can contain CO-data.
After that then this command should printout much more debug to see what's going on:
And I did the gfx906 build and can find the string causing problem on these files
|
@lamikr , thanks! Regarding new changes in the repo, will I have to build everything from scratch or only build only specific files in wip/rocm_sdk_builder_612_vega_testing ? I spent 3 days and over 10 hours building the last version of this repo. I hope this change will not require building everything from scratch. Thanks! |
ok, this time it took 1 hour to build. I am still seeing the llama.cpp error. This time it has all the error logs as you explained above. Attaching the output error here.
|
Here is the error logs from ` AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ROCM_SDK_PRINTOUT_DEBUG_MESSAGES=1 ./run_and_save_benchmarks.sh >>benchmark_error_output.txt 2>&1 |
No need to re-build everyhing. If you have a working build and run the "./babs.sh -up ", it will check
So when you then run the ./babs.sh -b next time, it will only re-build and install the changed projects. |
@lamikr , yes, I used the commands you shared above.
It took 1 hour to compile. I am still seeing the same 'symbol not found' errors. Please, refer to my comments above for detailed error logs. |
In comparison, here is my log for successful lllama_coo launch with gfx1030 and same parameters. The output is pretty-similar except in the the very end. I was expecting to see some erros in your case on those clang or lld build commands that it executes to build the model, but even those looked pretty same. ` amd_comgr_do_action: amd_comgr_do_action: amd_comgr_do_action: amd_comgr_do_action: amd_comgr_do_action: ` |
How about those grep commands on /opt/rocm_sdk_612/lib64/rocblas/library One possibility could be that this is some kind of xnack+/xnack- type error. I have not had needed to debug that kind of problem by myself but basically the gpu can run code in xnack- mode and then something else is build on xnack+ mode, then those are not compatible. I need to investigate this more. |
sure, here is the string matches:
and llama.cpp symbol matches:
|
Is there a way to switch to xnack+ mode in rocm/amd MI60 GPUs? If not, then I will wait for your update. Thanks! EDIT: |
Sorry, as I do not remember seeing this kind of error my self, this is little bit quess work for now to try to isolate the problem. So disabling comgr and hiprt would be again something to try next. Got idea from: So, if you have time, can you try to build MIOpen with following options to see if anything changes. -DMIOPEN_USE_COMGR=Off -DMIOPEN_USE_HIPRTC=Off It can be done by opening And then rebuilding MIOpen ./babs.sh --clean binfo/core/034_miopen.binfo |
Thanks! I will try it today when I get back home. |
ok, I was impatient and tried buidling MIOpen with above changes.
It was built without any errors.
Here is the full log with |
@lamikr , I see AMD MI50 (also gfx906) costs around $140 on eBay with shipping. Let me know if you are open to the idea of supporting gfx906. I am willing to ship one of those to you. Or else, if you are in the Bay Area, I can lend you one MI50. This way it will be easier for you to debug and fix issues. Thanks! |
@Said-Akbar Thank you for the suggestion, it would be great if I could loan one of your gfx906 for a while for testing. I live on the bay area but I travel also quite often also to San Francisco if that's easier for you. Are you able to send me a private message to gmail or linked-in? I just bought one gfx1010 from ebay to better test on rdna1 cards so I would like to hold a while before purchasing the gfx906. |
sure, let me send you a linkedin message. |
Hello @lamikr , Quick update. I installed the default rocm 6.2.4 library on my Ubuntu 24.04.
After that, I copied the 'library' folder from /opt/rocm-6.2.4/lib/rocblas/ to /opt/rocm_sdk_612/lib64/rocblas/ (I backed up the broken 'library' folder in here by renaming it as 'library2'). Amazingly, llama.cpp is working now.
Here is llama.cpp chat example:
In browser:
Interestingly, if you do not use Flash attention, the output is gibberish:
Not sure if this llama.cpp issue or rocm rocblas version mismatch issue. But I think you will be able to debug the issue better now (once I give you MI50) since we clearly know it is an issue with /opt/rocm_sdk_612/lib64/rocblas/library files. |
Here are two llama.cpp benchmarks. without flash attention:
with flash attention:
So, it is a bit faster without flash attention but it output gibberish. Flash attention slows down text generation a bit but it is readable. Also, splitting the model across two GPUs did not result in any speed improvements. |
Hi Said and thanks for the coffee and MI50 gpu loan!
|
I have two of them going in a build this week so maybe I can help out too! I have a bit faster build box also as its EPYC 7352 24 core. I'd be happy to let either of you have access as well for testing once its setup. You need to enable above 4G decoding for these cards if I remember correctly. Maybe that is causing a crash. also make sure you enable Resizable BAR and or Smart Access Memory in your BIOS. May also need SR-IOV enabled. Also the display output port does nothing with the Instinct VBIOS this is apparently so the framebuffer doesn't waste some of the vram. There is a V420 VBIOS that may or may not work on these cards I have not seen any confirmation one way or the other online about that though. V420 was never officially released. Not 100% sure the display output doesn't work on Linux there are conflicting posts about this. Anyway just some guesses on my part since you guys are already a bit ahead of me in setting up your hardware. For reference your kernel output should look more like this. The other failures for them are VM pass through related and not relevant for us. I also have 2 MI25 and a Vega FE I can start testing with at some point. |
@cb88 Thanks for the info, nice to get more people working with this card. So far I have found suprisingly little documentation about these MI50 cards. So it seems I need to hook the display to via iGPU to checkout the bios settings. (ordered just before reading the mini-displayport to hdmi cable...) So far I have only tested this over ssh. here is the probe error for this gpu I am seeing decoded: [ 5.418447] ? __warn (kernel/panic.c:748) |
Hello @lamikr , I have an Nvidia card for video output since these AMD cards do not have working video outputs. I used this command to install drivers for MI50/60 on my Ubuntu 24.04 which worked fine for me:
|
It took a day to struggle with bios update but I got the MI50 now showing up on rocminfo! Could have taken long time to figure this out without your suggestions. I but my steps/strugless here just to reference in case it's useful for somebody other. After the first bios update the boot went to newer-ending reboot cycle without showing anything on display. I suspect it failed to find proper settings for my ddr4. Anyway I was able to solve that by taking everything away, resetting cmos settings and then putting first only one ddr cam on place. Second bios update to latest version went then smoothly except that system refused to boot from my nvme card. So needed to re-install grub again. And then when I added the MI50 back, display went again black... Fixed that by removing MI50 again and then forcing in bios to use the display from my amd's iGPU. In my case I only found from bios the option to "Enable above 4G decoding". To my understanding that also enables in my gigabyte bios at least the smart access memory option. In addition of "enable above 4G encoding", I only enabled the "amd hw virtualization support" and tuned the memory settings little bit from defaults. Now the system uses the iGPU for display and MI50 driver probe also works ok and I can see both cards now on rocminfo. Here is relevant part of dmesg ' ` Agent 2 Name: gfx906 |
Great to hear that you made it work with your PC! Now you should be able to debug ROCM SDK related issues. |
I hope you were able to install the air blower and control its fan speed as well. When MI50 overheats, the card will throttle its performance. |
Hello @lamikr , How is your experiment going with llama.cpp and MI50? Can you please share commands to fix vllm installation? Thanks! |
I can also restart any test on the RVII MI25 |
I have tried to debug the rocblas problem now for couple of days but have not yet been able to find out what is causing the problem. I have added the debug and .so files from rocblas are loaded but I do not yet understand why it will then starts complaining about missing symbol. I have tested by building rocBLAS and tensile for gfx906, gfx906:xnack- and gfx906:xnack+ and always same problem. I have also tested to take all my rocblas and Tensile patches away that are needed for some other gpus's and also by building the rocblas and tensile versions with rocm-6.2.4 tags. I hope to find solution in this weekend... |
Is it the same problem they are talking about here? ROCm/rocBLAS#1455 |
Hello @lamikr , I see you closed the bug. What was the issue here? |
Hi @Said-Akbar It seems that this closed automatically once I pushed one fix in. I will re-open as there are some more work to do. The original problem with missing symbol is now fixed and you should be able to get it by running commands:
(babs.sh -ca may be needed as I detected one bug in "-up" command that I only fixed now to latest command version on today) llama cpp is now working at least for me with the MI50.
|
Benchmark has still some problems with the flash attention. SDPBackend.MATH will work but flash-attention gives ridiculous small time, indicating some type of error I have not yet been able to solve.
|
Time to have some ranting about this idea of optionally adding these xnack and sramecc parameters to be part of the product name... It just seems that the this combination of "xnack-", "xnack+", "sramecc-:xnack-" and other possible combinations is very easily to get south in the code or as a build parameters. In code-level most of the gpus identifies them self just with simple name like "gfx1010, gfx1100, etc..." and then there are these gcn devices which choose to behave differently. That would be somehow manageable if these features would be specified only as a build time parameter in a way that all projects will then handle them properly. In reality unfortunately there is now consistency. Some projects can be build either with pure name or with name which have these features as an extra parameter. And then there are projects like MIOpen which add these parameters silently on runtime (if name=gfx906, then name=gfx906:xnack-") And while internally there seems to be also the sramecc part of the product name in style "gfx906:sramecc+:xnack-", this sramecc for incosistency seems not to be allowed as a build parameter to be part of the name. I think the idea of embedding the sramecc/xnack parameters to be part of the name may have sounded to be a clever idea long time ago but seems to cause confusion and bugs in reality. This just makes it very hard to know whether some bug is just caused because some code may fail to compare gpu-names which have or not have those extra parameterss included in the gpu name in a exactly same way. |
Why not build with xnack+ it is supported on gfx906 and can be a significant perf improvement...for things that do need it. xnack- is diabling xnack support, which is probably undesirable, if you build it with xnack+ everything should work if it needs it or not. https://niconiconi.neocities.org/tech-notes/xnack-on-amd-gpus/ See here for the possible definitions... its sramecc+/- and xnack+/- for gfx906... there is no sram+ option. SRAMECC is probably only desirable if you are running long running scientific compute tasks... probably irrelevant for AI stuff. But if it gets in the way to disable it, it could be enabled it will just cause some memory overhead I think. |
Wondering why the rocBLAS binaries by default then does not enable xnack. In ROCBlas CMakeLists.txt they have "gfx906:xnack-" If I build just as a "gfx906' as a target and and add one of my trace patches to clr, I see in output:
|
Thanks for the fix!
Interesting. Phi-3-mini is a 3.8 billion parameter model. You should see at least ~60 tokens per second generation for MI50. I see you are using Thanks! |
also, as @cb88 mentioned xnack should increase the GPU's performance. But not sure how you can compile the repo with xnack enabled.
But this does not impact my GPU's inference speed since the code compiled for ROCm did not have xnack enabled. |
Have you been able to build and test now yourself? The application/script I use is in folder If you have multiple GPU's, you can control which to use by uncommenting the line |
-ngl 99 should not cause a crash Phi-3-mini fits fully in vram, typically you go -ngl 9999 and it just works so long as you have enough vram if it runs out it may crash. |
In principle you should be able to do build for "gfx906:xrand+" by adding that line to build_cfg.user manually and then doing a clean build. There may be 1-2 apps that will fail to build for now because they will only accept "gfx906" as a parameter without this xrand-thing. I can fix those later by adding a filter to their binfo-files that check the target-names. (A little similar what is now in binfo/core/038_aotriton.binfo) But until that is in place, you need to change the build_cfg.user to temporarily to have only the "gfx906" and then rebuild only that app and thenchange line back to "gfx906:xrand+" and continue building rest of the apps. |
I did triton and vllm fix, vllm works now for me without crashing on stable diffusion where it uses triton to build it. docs/example/llm/vllm |
thank you @lamikr! I tested the latest changes in triton and vllm. They are working now! |
Hi @lamikr,
I built rocm_sdk_builder on a freshly installed Ubuntu 24.04.1. It took 5 hours, 120GB of storage and many hours of fixing small issues during building the repo (reference: #175).
Also, I chose gfx906 from
./babs.sh -c
.When I ran
./run_and_save_benchmarks.sh
, I got this message.Note the error at the bottom 'Cannot find Symbol with name'. I thought this would not be an issue with llama.cpp.
However, I got a similar error in llama.cpp as well (I built it using
./babs.sh -b binfo/extra/ai_tools.blist
).llama.cpp is failing with a similar error. Note that this llama.cpp worked with the CPU when I do not set the ngl parameter (layer offloading). Please let me know if there is a fix.
The text was updated successfully, but these errors were encountered: