Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run KernelAbstractions.jl on CUDA H100 #39

Open
alexishuante opened this issue Oct 4, 2024 · 0 comments
Open

Unable to run KernelAbstractions.jl on CUDA H100 #39

alexishuante opened this issue Oct 4, 2024 · 0 comments

Comments

@alexishuante
Copy link

Hello,

I currently have an issue where I cannot run the code from KernelAbstractions.jl on CUDA (H100). I was able to run the code on an A100 without any problem. I already tried to update the packages but they have compatibility constraints. I also tried to change the CUDA runtime version multiple times but had no success. Any advice/help will help a lot, Thank you!

I am compiling/running the code as the following:

julia --project=KernelAbstractions -e 'import Pkg; Pkg.instantiate()'
julia --project=KernelAbstractions src/KernelAbstractions.jl

CUDA.versioninfo():

CUDA runtime 11.8, artifact installation
CUDA driver 12.4
NVIDIA driver 550.90.7

Libraries:

  • CUBLAS: 11.11.3
  • CURAND: 10.3.0
  • CUFFT: 10.9.0
  • CUSOLVER: 11.4.1
  • CUSPARSE: 11.7.5
  • CUPTI: 18.0.0
  • NVML: 12.0.0+550.90.7

Toolchain:

  • Julia: 1.9.1
  • LLVM: 14.0.6
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
  • Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

2 devices:
0: NVIDIA H100 NVL (sm_90, 93.000 GiB / 93.584 GiB available)
1: NVIDIA H100 NVL (sm_90, 93.000 GiB / 93.584 GiB available)

Pkg status:

⌅ [21141c5a] AMDGPU v0.4.8
[c7e460c6] ArgParse v1.2.0
⌅ [052768ef] CUDA v4.0.1
[72cfdca4] CUDAKernels v0.4.7
⌅ [63c18a36] KernelAbstractions v0.8.6
[d96e819e] Parameters v0.12.3
[7eb9e9f0] ROCKernels v0.3.5
[90137ffa] StaticArrays v1.9.7

This is the error I am getting:

ERROR: LoadError: CUDA error: device kernel image is invalid (code 300, ERROR_INVALID_SOURCE)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ./strings/substring.jl:222
[2] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA ~/.julia/packages/CUDA/ZdCxS/lib/cudadrv/module.jl:60
[3] CuModule
@ ~/.julia/packages/CUDA/ZdCxS/lib/cudadrv/module.jl:23 [inlined]
[4] cufunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
@ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:488
[5] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:95
[6] cufunction(f::typeof(gpu_fasten_main), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(16384,)}, KernelAbstractions.NDIteration.DynamicCheck, Nothing, Nothing, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.StaticSize{(64,)}, Nothing, Nothing}}, Int64, Int64, CuDeviceVector{Atom, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{FFParams, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceVector{Float32, 1}, Val{4}}}; name::Nothing, always_inline::Bool, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:maxthreads,), Tuple{Int64}}})
@ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
[7] macro expansion
@ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
[8] (::KernelAbstractions.Kernel{CUDADevice{false, false}, KernelAbstractions.NDIteration.StaticSize{(64,)}, KernelAbstractions.NDIteration.StaticSize{(16384,)}, typeof(gpu_fasten_main)})(::Int64, ::Vararg{Any}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Nothing, progress::Function)
@ CUDAKernels ~/.julia/packages/CUDAKernels/3IKLV/src/CUDAKernels.jl:283
[9] run(params::Params, deck::Deck, device::Tuple{CuDevice, String, Backend})
@ Main ~/Fall2024/miniBUDE/src/julia/miniBUDE.jl/src/KernelAbstractions.jl:146
[10] main()
@ Main ~/Fall2024/miniBUDE/src/julia/miniBUDE.jl/src/BUDE.jl:222
[11] top-level scope
@ ~/Fall2024/miniBUDE/src/julia/miniBUDE.jl/src/KernelAbstractions.jl:313
in expression starting at /home/xa2/Fall2024/miniBUDE/src/julia/miniBUDE.jl/src/KernelAbstractions.jl:313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant