Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new ReactantState architecture #4071

Merged
merged 34 commits into from
Feb 6, 2025
Merged

Implement a new ReactantState architecture #4071

merged 34 commits into from
Feb 6, 2025

Conversation

wsmoses
Copy link
Collaborator

@wsmoses wsmoses commented Jan 29, 2025

This allows us to use Reactant's traced arrays with Oceananigans, which permits optimization via XLA.

#####
##### These methods are extended in DistributedComputations.jl
#####

device(::CPU) = KernelAbstractions.CPU()
device(::GPU) = CUDA.CUDABackend(; always_inline=true)
# While there is no Reactant backend this worls
device(::ReactantState) = CUDA.CUDABackend(; always_inline=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to let the backend be a property of ReactantState?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is used to say we are importing CUDA kernel source code to reactant, which is the current correct default (and should be decided in the KA reactant backend when ready).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see, I had a misconception about the meaning of "backend" and "device"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a comment would be appropriate here to describe what is happening

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment

automatic:
- exit_status: 1
limit: 1
depends_on: "init_cpu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have CPU tests if the backend is hardcoded for GPU? I may be missing something…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not hardcoded for GPU, counterintuitively.

Reactant is taking in the CUDA kernels but then deciding to execute it on whatever the reactant default backend is (which could be CPU or TPU — we just landed the GPU kernel to CPU kernel support last night. TPU in progress).

@simone-silvestri
Copy link
Collaborator

we could introduce this change as part of this PR: #3475 (comment)_

@simone-silvestri
Copy link
Collaborator

We are also missing a method for

@inline convert_args(::CPU, args) = args

necessary for split explicit free surface.
Also over here we can just use the KA equivalent instead of manually converting

@simone-silvestri
Copy link
Collaborator

Also over here we can just use the KA equivalent instead of manually converting

for reference
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/8a87f77e0e8f4d435006ab73185817f4b1b83dbe/src/KernelAbstractions.jl#L789

@glwagner
Copy link
Member

We are also missing a method for

@inline convert_args(::CPU, args) = args

necessary for split explicit free surface.
Also over here we can just use the KA equivalent instead of manually converting

Can we improve that interface? It's pretty unclear what exactly we are "converting" to. Shoiuld it be convert_to_device or something? What does "args" mean?

@glwagner
Copy link
Member

@jm-c can you please approve this PR so that it can be merged?

@simone-silvestri
Copy link
Collaborator

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

@simone-silvestri
Copy link
Collaborator

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

Sorry just realized this must have been a confusion with #4070 😅

#####
##### These methods are extended in DistributedComputations.jl
#####

device(::CPU) = KernelAbstractions.CPU()
device(::GPU) = CUDA.CUDABackend(; always_inline=true)
# While there is no Reactant backend this worls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# While there is no Reactant backend this worls
# Temporary fix before we develop a Reactant backend

@glwagner
Copy link
Member

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

Sorry just realized this must have been a confusion with #4070 😅

what the heck. no wonder #4070 didn't get approved HAHA

@glwagner glwagner changed the title Reactant ext Implement a new ReactantState architecture Jan 31, 2025
@wsmoses wsmoses merged commit c1014d0 into main Feb 6, 2025
50 checks passed
@giordano
Copy link

giordano commented Feb 6, 2025

With Reactant#main + Oceananigans#main I get

ERROR: The following 1 direct dependency failed to precompile:

OceananigansReactantExt

Failed to precompile OceananigansReactantExt [abc3f31f-7b77-5c83-815b-0e826f781516] to "/home/giordano/.julia/compiled/v1.11/OceananigansReactantExt/jl_0QuYuA".
AssertionError("Could not find registered platform with name: \"cuda\". Available platform names are: ")
ERROR: LoadError: type Nothing has no field ReactantBackend
Stacktrace:
 [1] getproperty(x::Nothing, f::Symbol)
   @ Base ./Base.jl:49
 [2] top-level scope
   @ ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/Architectures.jl:11
 [3] include(mod::Module, _path::String)
   @ Base ./Base.jl:557
 [4] include(x::String)
   @ OceananigansReactantExt ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:1
 [5] top-level scope
   @ ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:5
 [6] include
   @ ./Base.jl:557 [inlined]
 [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2881
 [8] top-level scope
   @ stdin:6
in expression starting at /home/giordano/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/Architectures.jl:1
in expression starting at /home/giordano/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:1

@glwagner
Copy link
Member

glwagner commented Feb 6, 2025

Ok so this is a bug we get on a machine with no GPU? We can set up a test on the GitHub actions pipeline to catch?

Copy link
Collaborator

@tomchor tomchor Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was already merged, but should these commented lines be here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see the lines you're referring to

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

# include("Utils.jl")
# include("BoundaryConditions.jl")
# include("Fields.jl")
# include("MultiRegion.jl")
# include("Solvers.jl")
using .Architectures
# using .Utils
# using .BoundaryConditions
# using .Fields
# using .MultiRegion
# using .Solvers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly they were notes-to-self for @wsmoses. I can elaborate on the comments in a subsequent PR. Which I think we will need to set up tests on a nonGPU machine per @giordano's issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, these were indeed the lines. I accidentally tagged the wrong thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so these are "additional modules that may need to be extended", helpful for people working on compat who aren't intimately familiar with the internal structure of Oceananigans. I can add that note in a subsequent PR

@wsmoses
Copy link
Collaborator Author

wsmoses commented Feb 6, 2025

No this doesn't have anything to do with CPU.

The issue is that

const ReactantKernelAbstractionsExt = Base.get_extension(
returns nothing.

How did you do your package setup?

Maybe we should do a release to fix

cc @gbaraldi

@wsmoses wsmoses deleted the reactant branch February 6, 2025 15:34
@tomchor
Copy link
Collaborator

tomchor commented Feb 6, 2025

I'm just becoming aware of this PR (after it was merged) and I'm wondering if the top comment can be updated with some context. I guess ReactantState has to do with https://github.com/EnzymeAD/Reactant.jl ?

A general comment is that I think we should avoid PRs with no context/explanation except for very small self-explanatory cases. (Fixing a typo for example)

@CliMA CliMA deleted a comment from giordano Feb 6, 2025
@glwagner
Copy link
Member

glwagner commented Feb 6, 2025

Oops, I didn't mean to delete your comment @giordano !

@glwagner
Copy link
Member

glwagner commented Feb 6, 2025

No this doesn't have anything to do with CPU.

The issue is that

const ReactantKernelAbstractionsExt = Base.get_extension(

returns nothing.
How did you do your package setup?

Maybe we should do a release to fix

cc @gbaraldi

I get something similar

Reactant Super Simple Simulation Tests: Error During Test at /Users/gregorywagner/Projects/dev/Oceananigans.jl/test/test_reactant.jl:16
  Got exception outside of a @test
  ArgumentError: No available targets are compatible with triple "nvptx64-nvidia-cuda"
  Stacktrace:
    [1] LLVM.Target(; name::Nothing, triple::String)
      @ LLVM ~/.julia/packages/LLVM/b3kFs/src/target.jl:33
    [2] Target
      @ ~/.julia/packages/LLVM/b3kFs/src/target.jl:23 [inlined]
    [3] llvm_machine(target::GPUCompiler.PTXCompilerTarget)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:49
    [4] macro expansion
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:140 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/LLVM/b3kFs/src/base.jl:97 [inlined]
    [6] finish_module!(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget}, mod::LLVM.Module, entry::LLVM.Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:137
    [7] finish_module!(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, mod::LLVM.Module, entry::LLVM.Function)
      @ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:58
    [8] macro expansion
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:183 [inlined]
    [9] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:108
   [10] emit_llvm
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:106 [inlined]
   [11] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:100
   [12] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{optimize::Bool, cleanup::Bool, validate::Bool, libraries::Bool})
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:79
   [13] compile
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:74 [inlined]
   [14] (::ReactantCUDAExt.var"#10#13"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})(ctx::LLVM.Context)
      @ ReactantCUDAExt ~/.julia/packages/Reactant/7y9bj/ext/ReactantCUDAExt.jl:420
   [15] JuliaContext(f::ReactantCUDAExt.var"#10#13"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:34

@wsmoses
Copy link
Collaborator Author

wsmoses commented Feb 6, 2025

@glwagner your issue is distinct and a Macos-specific issue resolved in JuliaGPU/GPUCompiler.jl#660 (released an hour ago). It should work fine on linux (or mac once gpucompiler is updated across the ecosystem)

@glwagner
Copy link
Member

glwagner commented Feb 6, 2025

@glwagner your issue is distinct and a Macos-specific issue resolved in JuliaGPU/GPUCompiler.jl#660 (released an hour ago). It should work fine on linux (or mac once gpucompiler is updated across the ecosystem)

excellent news

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants