Add dynamic shared memory allocation #41

michael-kenzel · 2023-08-04T21:34:52Z

This adds a parameter to allocate a given amount of dynamic shared memory upon kernel launch. Wrapper functions that just pass 0 are provided for backwards compatibility with existing code. Currently implemented for CUDA only, other platforms will error.

corresponding thorin changes: AnyDSL/thorin#144

Hugobros3 · 2024-08-05T13:31:39Z

src/hsa_platform.cpp

@@ -409,7 +409,7 @@ void HSAPlatform::launch_kernel(DeviceId dev, const LaunchParams& launch_params)
    aql.kernel_object        = kernel_info.kernel;
    aql.kernarg_address      = kernel_info.kernarg_segment;
    aql.private_segment_size = kernel_info.private_segment_size;
-    aql.group_segment_size   = kernel_info.group_segment_size;
+    aql.group_segment_size   = (kernel_info.group_segment_size + 15) / 16 * kernel_info.group_segment_size + launch_params.lmem;


relevance of this change (the rounding part) ?

It's to account for padding to align the start of the dynamic allocation.

stlemme · 2024-08-05T13:32:10Z

platforms/artic/intrinsics_thorin.impala

+#[import(cc = "thorin", name = "opencl")] fn opencl_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
+#[import(cc = "thorin", name = "amdgpu_hsa")] fn amdgpu_hsa_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
+#[import(cc = "thorin", name = "amdgpu_pal")] fn amdgpu_pal_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
+#[import(cc = "thorin")] fn local_memory() -> &mut addrspace(3)[u8];


Rename that to shared_memory_base to be more specific

We decided to call it "local memory" since that's a more adequate/common moniker outside of CUDA. But we can call it local_memory_base() I guess.

richardmembarth · 2024-08-05T13:37:03Z

platforms/impala/intrinsics_thorin.impala

+fn @@cuda(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { cuda_with_lmem(dev, grid, block, 0, body) }
+fn @@nvvm(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { nvvm_with_lmem(dev, grid, block, 0, body) }
+fn @@opencl(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { opencl_with_lmem(dev, grid, block, 0, body) }
+fn @@amdgpu_hsa(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_hsa_with_lmem(dev, grid, block, 0, body) }
+fn @@amdgpu_pal(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_pal_with_lmem(dev, grid, block, 0, body) }


@@ should be @ at the function

michael-kenzel requested a review from richardmembarth August 4, 2023 21:34

michael-kenzel mentioned this pull request Aug 5, 2023

Add dynamic shared memory allocation AnyDSL/thorin#144

Open

michael-kenzel marked this pull request as draft August 9, 2023 12:27

michael-kenzel force-pushed the mikey/smem branch from 27f8713 to 4c2b42b Compare August 10, 2023 20:12

michael-kenzel force-pushed the mikey/smem branch 2 times, most recently from d538558 to d8902e2 Compare August 24, 2023 11:56

michael-kenzel force-pushed the mikey/smem branch from d8902e2 to 7ad75e2 Compare August 1, 2024 04:11

michael-kenzel marked this pull request as ready for review August 1, 2024 17:59

Hugobros3 reviewed Aug 5, 2024

View reviewed changes

stlemme reviewed Aug 5, 2024

View reviewed changes

richardmembarth reviewed Aug 5, 2024

View reviewed changes

Add dynamic shared memory allocation

3c6e9a4

michael-kenzel force-pushed the mikey/smem branch from 7ad75e2 to 3c6e9a4 Compare August 5, 2024 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic shared memory allocation #41

Add dynamic shared memory allocation #41

michael-kenzel commented Aug 4, 2023 •

edited

Loading

Hugobros3 Aug 5, 2024 •

edited

Loading

michael-kenzel Aug 5, 2024

stlemme Aug 5, 2024

michael-kenzel Aug 5, 2024 •

edited

Loading

richardmembarth Aug 5, 2024

michael-kenzel Aug 5, 2024

Add dynamic shared memory allocation #41

Are you sure you want to change the base?

Add dynamic shared memory allocation #41

Conversation

michael-kenzel commented Aug 4, 2023 • edited Loading

Hugobros3 Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

michael-kenzel Aug 5, 2024

Choose a reason for hiding this comment

stlemme Aug 5, 2024

Choose a reason for hiding this comment

michael-kenzel Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

richardmembarth Aug 5, 2024

Choose a reason for hiding this comment

michael-kenzel Aug 5, 2024

Choose a reason for hiding this comment

michael-kenzel commented Aug 4, 2023 •

edited

Loading

Hugobros3 Aug 5, 2024 •

edited

Loading

michael-kenzel Aug 5, 2024 •

edited

Loading