-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to open hip GPU device (gfx1030) with latest AOMP #187
Comments
I am in both video and render groups: id |
I suggested |
yup. I tried both just setting the env var before building my sample with hipcc and that didn't help. Rebuilding all of aomp with that env var set doesn't help either. Happy to gather any other debug information that is relevant. To make sure there is nothing in /opt/rocm I only have rocm-smi there ls -ltr /opt/rocm-4.0.0/lib/* If required I can rebuild that too but I doubt that could be the issue. Thanks for your quick responses. |
Ah. I didn't notice you were using hipcc. When I try to run hip code locally, I get a variant on 'no devices found', which seems to correlate with an invalid branch in the hip runtime. Running the host application under valgrind blames libamdhip64.so at least. Hopefully Greg has more information on that, I haven't tried to debug the hip runtime. |
Just getting familiar with runtimes. What other runtime can I use? I am trying to get Tensile going with gfx1030 which seems to require hipcc. Yeah gdb points to libamdhip64.so. |
The bottom of the stack on linux is kfd (in the linux kernel), then roct which is roughly the userspace driver part of kfd. On top of that is an implementation of the HSA spec, rocr. Those have all been robust under my testing. The OpenMP implementation on amdgpu builds directly on top of rocr for that reason. Depending on your use case, c++ compiled for amdgcn as freestanding and launched using the functions in hsa.h works well. Opencl has its own runtime, but it looks like it's now built on the same foundation as hip so may have the same bug reported here. Windows does some different things, and so does the graphics stack. libamdhip64.so contains, as far as I can tell, roct, rocr, rocclr, hip. Something in that appears to be broken. There's a lot of code though so it's not an easy fix. HIP mostly track errors through an internal Jira system. Is Tensile the rocm library with that name? If so, an issue suggests it worked on a gfx1010 in November. You might therefore be able to get a working HIP installation by rolling back to a release made around then. I've added Siu Chi to this issue as he is much closer to the HIP development than me. |
Cool. Thanks for the clarity - just so many rocXX libs it was hard to understand the layering. I think c++ compiled for amdgcn and launched with hsa.h is best for us. I will look around for rocr samples as a starting point. I was trying to get Tensile up and running on gfx1030 because those are the "baseline" GEMM routines for rocblas and want to compare to that performance too. I filed a few issues about it ROCm/Tensile#1282 |
Unfortunately looks like the last release of rocr was 3.1.x and there is no 4.x or later branches ROCm/ROCR-Runtime#111 Are you able to test with the opensource rocr from https://github.com/RadeonOpenCompute/ROCR-Runtime ? Any chance we can get an updated rocr or is 3.1.x supposed to work for gfx10 ? |
ok so rocr seems to be working. I have verified that with rocm_bandwidth_test (https://github.com/RadeonOpenCompute/rocm_bandwidth_test) since rocr-runtime doesn't have any tests. so something is broken along rocclr / hip for gfx10. ./rocm-bandwidth-test
Thanks for the pointers. |
@JonChesterfield do you have any examples / tests that do the "c++ compiled for amdgcn as freestanding and launched using the functions in hsa.h " ? I am trying to follow https://github.com/RadeonOpenCompute/rocminfo as an example but I dont see gcn binaries in the final elf file that goes into the rocr / hsa runtime. update: found https://github.com/ROCm-Developer-Tools/LLVM-AMDGPU-Assembler-Extra to play around with. Update 2: I have been able to run simple code after updating to code object version 3 . Pushed a fork https://github.com/Powderluv/LLVM-AMDGPU-Assembler-Extra |
Hey. I missed the above comments but saw this while looking at the tangentially related #193. I'm not clear what the status of the gfx10 cards is - the 4.1 release notes don't seem to mention it. Unofficially some code does seem to run on them, and I believe rocr and the compiler backend are functional. OpenMP does not work on gfx10 yet, working on that at present. The code object format is currently transitioning from 3 to 4. I think the status is rocm 3.10 needs v3, rocm 4.1 can use v4, llvm trunk is reviewing patches to bring v4 online. Using raw C++ means trading the many conveniences of the high level languages for an increase in control. Documentation is sparse, your mileage may vary. Nevertheless, an example of going down that rabbit hole is https://github.com/jonChesterfield/hostrpc, which is a bare metal prototype that I'm hoping to implement libc on top of (thus getting away from freestanding for applications). You may find it interesting but it's not production code yet. Compiling as freestanding invocation is along the lines of: To get something that can be launched, one currently needs to use opencl/hip/openmp/IR/asm as the kernel calling convention is not exposed to c++. That's somewhat annoying but the 'kernel' function only needs to contain a call to something written in C. E.g.:
given some IR that contains one or more kernel functions, llc can emit a code object which the hsa loader can run on the gpu. The interface to that is RadeonOpenCompute/ROCR-Runtime/src/inc/hsa.h. It's verbose, but works broadly as the comments suggest. |
Thank you for this. hostrpc seems very useful. We will give it a spin and post issues here or on the hostrpc repo. Also libc would be fantastic along with some utils for debugging and logging |
OpenMP team, what is status of AOMP on gfx1030? Should we get a test machine in our AOMP lab? |
FYI, ROCm/ROCm#887 (comment) |
RocmBandwidthTest Version: 2.6.0 / rocm-5.1.2 gfx1030 / uname = 5.4.0-122-generic
|
@powderluv Do you still need assistance with this ticket? If not, please close the ticket. Thanks! |
I have built latest AOMP (SHA: e2f40a7) with the amd-stg-open branch. However it is unable to enumerate the HIP GPU device though rocminfo shows both cpu and gpu. I have a 6900XT (gfx1030) and am trying to get Tensile to work on it.
(I have this ROCm/HIP#2219 locally to fix the clang_rt builtin issue on hosts).
See below:
I am running this code:
https://gitlab.com/cscs-ci/ci-testing/ault-amdgpu/-/blob/master/helloworld.cpp
Got an error hipErrorNoDevice
I verified I am in the video group and sudo doesn't help.
The text was updated successfully, but these errors were encountered: