Implement a new ReactantState architecture #4071

wsmoses · 2025-01-29T12:13:28Z

This allows us to use Reactant's traced arrays with Oceananigans, which permits optimization via XLA.

glwagner · 2025-01-29T16:03:07Z

src/Architectures.jl

 #####
 ##### These methods are extended in DistributedComputations.jl
 #####

 device(::CPU) = KernelAbstractions.CPU()
 device(::GPU) = CUDA.CUDABackend(; always_inline=true)
+# While there is no Reactant backend this worls
+device(::ReactantState) = CUDA.CUDABackend(; always_inline=true)


Does it make sense to let the backend be a property of ReactantState?

Currently this is used to say we are importing CUDA kernel source code to reactant, which is the current correct default (and should be decided in the KA reactant backend when ready).

ok I see, I had a misconception about the meaning of "backend" and "device"

I guess a comment would be appropriate here to describe what is happening

Added comment

glwagner · 2025-01-29T16:04:18Z

.buildkite/pipeline.yml

+      automatic:
+        - exit_status: 1
+          limit: 1
+    depends_on: "init_cpu"


Does it make sense to have CPU tests if the backend is hardcoded for GPU? I may be missing something…

It is not hardcoded for GPU, counterintuitively.

Reactant is taking in the CUDA kernels but then deciding to execute it on whatever the reactant default backend is (which could be CPU or TPU — we just landed the GPU kernel to CPU kernel support last night. TPU in progress).

simone-silvestri · 2025-01-29T16:11:31Z

we could introduce this change as part of this PR: #3475 (comment)_

ext/OceananigansReactantExt/Architectures.jl

simone-silvestri · 2025-01-29T16:27:05Z

We are also missing a method for

Oceananigans.jl/src/Architectures.jl

Line 118 in 06a7c1b

@inline convert_args(::CPU, args) = args

necessary for split explicit free surface.
Also over here we can just use the KA equivalent instead of manually converting

simone-silvestri · 2025-01-29T16:28:09Z

Also over here we can just use the KA equivalent instead of manually converting

for reference
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/8a87f77e0e8f4d435006ab73185817f4b1b83dbe/src/KernelAbstractions.jl#L789

glwagner · 2025-01-29T18:21:37Z

We are also missing a method for

Oceananigans.jl/src/Architectures.jl

Line 118 in 06a7c1b

@inline convert_args(::CPU, args) = args

necessary for split explicit free surface.
Also over here we can just use the KA equivalent instead of manually converting

Can we improve that interface? It's pretty unclear what exactly we are "converting" to. Shoiuld it be convert_to_device or something? What does "args" mean?

glwagner · 2025-01-29T18:34:56Z

@jm-c can you please approve this PR so that it can be merged?

simone-silvestri · 2025-01-29T22:06:02Z

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

simone-silvestri · 2025-01-29T22:15:12Z

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

Sorry just realized this must have been a confusion with #4070 😅

simone-silvestri · 2025-01-29T22:17:41Z

src/Architectures.jl

 #####
 ##### These methods are extended in DistributedComputations.jl
 #####

 device(::CPU) = KernelAbstractions.CPU()
 device(::GPU) = CUDA.CUDABackend(; always_inline=true)
+# While there is no Reactant backend this worls


Suggested change

# While there is no Reactant backend this worls

# Temporary fix before we develop a Reactant backend

glwagner · 2025-01-29T23:01:02Z

@jm-c can you please approve this PR so that it can be merged?

I would wait a second, there are still some items to be solved

Sorry just realized this must have been a confusion with #4070 😅

what the heck. no wonder #4070 didn't get approved HAHA

giordano · 2025-02-06T12:47:20Z

With Reactant#main + Oceananigans#main I get

ERROR: The following 1 direct dependency failed to precompile:

OceananigansReactantExt

Failed to precompile OceananigansReactantExt [abc3f31f-7b77-5c83-815b-0e826f781516] to "/home/giordano/.julia/compiled/v1.11/OceananigansReactantExt/jl_0QuYuA".
AssertionError("Could not find registered platform with name: \"cuda\". Available platform names are: ")
ERROR: LoadError: type Nothing has no field ReactantBackend
Stacktrace:
 [1] getproperty(x::Nothing, f::Symbol)
   @ Base ./Base.jl:49
 [2] top-level scope
   @ ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/Architectures.jl:11
 [3] include(mod::Module, _path::String)
   @ Base ./Base.jl:557
 [4] include(x::String)
   @ OceananigansReactantExt ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:1
 [5] top-level scope
   @ ~/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:5
 [6] include
   @ ./Base.jl:557 [inlined]
 [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2881
 [8] top-level scope
   @ stdin:6
in expression starting at /home/giordano/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/Architectures.jl:1
in expression starting at /home/giordano/.julia/packages/Oceananigans/BuM1I/ext/OceananigansReactantExt/OceananigansReactantExt.jl:1

glwagner · 2025-02-06T15:29:35Z

Ok so this is a bug we get on a machine with no GPU? We can set up a test on the GitHub actions pipeline to catch?

tomchor · 2025-02-06T15:32:18Z

ext/OceananigansReactantExt/OceananigansReactantExt.jl

I know this was already merged, but should these commented lines be here?

I can't see the lines you're referring to

Probably

Oceananigans.jl/ext/OceananigansReactantExt/OceananigansReactantExt.jl

Lines 7 to 18 in a6965f7

# include("Utils.jl")

# include("BoundaryConditions.jl")

# include("Fields.jl")

# include("MultiRegion.jl")

# include("Solvers.jl")

using .Architectures

# using .Utils

# using .BoundaryConditions

# using .Fields

# using .MultiRegion

# using .Solvers

Possibly they were notes-to-self for @wsmoses. I can elaborate on the comments in a subsequent PR. Which I think we will need to set up tests on a nonGPU machine per @giordano's issues

Yeah, sorry, these were indeed the lines. I accidentally tagged the wrong thing.

Yeah so these are "additional modules that may need to be extended", helpful for people working on compat who aren't intimately familiar with the internal structure of Oceananigans. I can add that note in a subsequent PR

wsmoses · 2025-02-06T15:32:19Z

No this doesn't have anything to do with CPU.

The issue is that

Oceananigans.jl/ext/OceananigansReactantExt/Architectures.jl

Line 8 in c1014d0

const ReactantKernelAbstractionsExt = Base.get_extension(

returns nothing.

How did you do your package setup?

Maybe we should do a release to fix

cc @gbaraldi

tomchor · 2025-02-06T15:36:13Z

I'm just becoming aware of this PR (after it was merged) and I'm wondering if the top comment can be updated with some context. I guess ReactantState has to do with https://github.com/EnzymeAD/Reactant.jl ?

A general comment is that I think we should avoid PRs with no context/explanation except for very small self-explanatory cases. (Fixing a typo for example)

glwagner · 2025-02-06T16:23:48Z

Oops, I didn't mean to delete your comment @giordano !

glwagner · 2025-02-06T16:26:02Z

No this doesn't have anything to do with CPU.

The issue is that

Oceananigans.jl/ext/OceananigansReactantExt/Architectures.jl

Line 8 in c1014d0

const ReactantKernelAbstractionsExt = Base.get_extension(

returns nothing.
How did you do your package setup?

Maybe we should do a release to fix

cc @gbaraldi

I get something similar

Reactant Super Simple Simulation Tests: Error During Test at /Users/gregorywagner/Projects/dev/Oceananigans.jl/test/test_reactant.jl:16
  Got exception outside of a @test
  ArgumentError: No available targets are compatible with triple "nvptx64-nvidia-cuda"
  Stacktrace:
    [1] LLVM.Target(; name::Nothing, triple::String)
      @ LLVM ~/.julia/packages/LLVM/b3kFs/src/target.jl:33
    [2] Target
      @ ~/.julia/packages/LLVM/b3kFs/src/target.jl:23 [inlined]
    [3] llvm_machine(target::GPUCompiler.PTXCompilerTarget)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:49
    [4] macro expansion
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:140 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/LLVM/b3kFs/src/base.jl:97 [inlined]
    [6] finish_module!(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget}, mod::LLVM.Module, entry::LLVM.Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/ptx.jl:137
    [7] finish_module!(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, mod::LLVM.Module, entry::LLVM.Function)
      @ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:58
    [8] macro expansion
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:183 [inlined]
    [9] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:108
   [10] emit_llvm
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:106 [inlined]
   [11] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:100
   [12] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{optimize::Bool, cleanup::Bool, validate::Bool, libraries::Bool})
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:79
   [13] compile
      @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:74 [inlined]
   [14] (::ReactantCUDAExt.var"#10#13"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})(ctx::LLVM.Context)
      @ ReactantCUDAExt ~/.julia/packages/Reactant/7y9bj/ext/ReactantCUDAExt.jl:420
   [15] JuliaContext(f::ReactantCUDAExt.var"#10#13"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
      @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:34

wsmoses · 2025-02-06T17:26:53Z

@glwagner your issue is distinct and a Macos-specific issue resolved in JuliaGPU/GPUCompiler.jl#660 (released an hour ago). It should work fine on linux (or mac once gpucompiler is updated across the ecosystem)

glwagner · 2025-02-06T17:37:38Z

@glwagner your issue is distinct and a Macos-specific issue resolved in JuliaGPU/GPUCompiler.jl#660 (released an hour ago). It should work fine on linux (or mac once gpucompiler is updated across the ecosystem)

excellent news

Reactant ext

7ed9e62

wsmoses requested review from glwagner and simone-silvestri January 29, 2025 12:13

fix

25a72e2

glwagner approved these changes Jan 29, 2025

View reviewed changes

glwagner reviewed Jan 29, 2025

View reviewed changes

Update Architectures.jl

401a51d

simone-silvestri reviewed Jan 29, 2025

View reviewed changes

ext/OceananigansReactantExt/Architectures.jl Outdated Show resolved Hide resolved

Update Architectures.jl

14f0799

Update Architectures.jl

fbc78c2

simone-silvestri reviewed Jan 29, 2025

View reviewed changes

simone-silvestri and others added 2 commits January 29, 2025 23:21

Merge branch 'main' into reactant

20e3836

Update Architectures.jl

0cf9c8f

wsmoses and others added 2 commits January 30, 2025 00:08

Update Architectures.jl

bb121c8

some fixes

ac8cd58

simone-silvestri approved these changes Jan 30, 2025

View reviewed changes

glwagner changed the title ~~Reactant ext~~ Implement a new ReactantState architecture Jan 31, 2025

glwagner and others added 5 commits January 31, 2025 16:06

Clean up zeros

61cc792

Transition to KA ext

a3d7ae3

Update OceananigansReactantExt.jl

3d81a6c

fix

4727a40

fix

9bf3cfd

glwagner and others added 18 commits February 1, 2025 08:52

Use zeros(arch, FT, ...) to mirror KA form zeros(device, FT, ...)

650a1d1

Fix grid-based zeros

050ebe5

Reduce size of reactant test

67c2829

Bump to latest reactant

68047cd

fix

49916b9

fix

704e427

fix

22df05b

fixups

42539cf

fix

d2ecef6

Update test_reactant.jl

cf55499

cudevice

9dbc4a0

Don't have reactant preallocate mem

c0fdb46

fix dist

8d56ead

Update test_reactant.jl

3f067cf

Update test_reactant.jl

1d04652

Update test_reactant.jl

ad7e53d

Update test_reactant.jl

195e133

Merge branch 'main' into reactant

a6965f7

wsmoses merged commit c1014d0 into main Feb 6, 2025
50 checks passed

tomchor reviewed Feb 6, 2025

View reviewed changes

wsmoses deleted the reactant branch February 6, 2025 15:34

CliMA deleted a comment from giordano Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a new ReactantState architecture #4071

Implement a new ReactantState architecture #4071

wsmoses commented Jan 29, 2025 •

edited by glwagner

Loading

glwagner Jan 29, 2025

wsmoses Jan 29, 2025

glwagner Jan 29, 2025

simone-silvestri Jan 29, 2025

wsmoses Jan 29, 2025

glwagner Jan 29, 2025

wsmoses Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

glwagner commented Jan 29, 2025

glwagner commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri Jan 29, 2025

glwagner commented Jan 29, 2025

giordano commented Feb 6, 2025

glwagner commented Feb 6, 2025

tomchor Feb 6, 2025 •

edited

Loading

glwagner Feb 6, 2025

giordano Feb 6, 2025

glwagner Feb 6, 2025

tomchor Feb 6, 2025

glwagner Feb 6, 2025

wsmoses commented Feb 6, 2025

tomchor commented Feb 6, 2025

glwagner commented Feb 6, 2025

glwagner commented Feb 6, 2025

wsmoses commented Feb 6, 2025

glwagner commented Feb 6, 2025

	# While there is no Reactant backend this worls
	# Temporary fix before we develop a Reactant backend

	# include("Utils.jl")
	# include("BoundaryConditions.jl")
	# include("Fields.jl")
	# include("MultiRegion.jl")
	# include("Solvers.jl")

	using .Architectures
	# using .Utils
	# using .BoundaryConditions
	# using .Fields
	# using .MultiRegion
	# using .Solvers

Implement a new ReactantState architecture #4071

Implement a new ReactantState architecture #4071

Conversation

wsmoses commented Jan 29, 2025 • edited by glwagner Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

glwagner commented Jan 29, 2025

glwagner commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

simone-silvestri commented Jan 29, 2025

Choose a reason for hiding this comment

glwagner commented Jan 29, 2025

giordano commented Feb 6, 2025

glwagner commented Feb 6, 2025

tomchor Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wsmoses commented Feb 6, 2025

tomchor commented Feb 6, 2025

glwagner commented Feb 6, 2025

glwagner commented Feb 6, 2025

wsmoses commented Feb 6, 2025

glwagner commented Feb 6, 2025

wsmoses commented Jan 29, 2025 •

edited by glwagner

Loading

tomchor Feb 6, 2025 •

edited

Loading