-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup CUDA, Reuse Memory, Add Serial Model, Cleaup Std Parallelism #202
base: develop
Are you sure you want to change the base?
Conversation
2b9129e
to
6c83420
Compare
This commit puts benchmarks in control of allocating the host memory used for verifying the results. This enables benchmarks that use Unified Memory for the device allocations, to avoid the host-side allocation and just pass pointers to the device allocation to the benchmark driver. Closes UoB-HPC#128 .
#ifdef INDICES | ||
// NVHPC workaround: TODO: remove this eventually | ||
#if defined(__NVCOMPILER) && defined(_NVHPC_STDPAR_GPU) | ||
#define WORKAROUND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a pragma message to print that workarounds are enabled.
#else | ||
|
||
// auto exe_policy = dpl::execution::seq; | ||
// auto exe_policy = dpl::execution::par; | ||
static constexpr auto exe_policy = dpl::execution::par_unseq; | ||
#define USE_STD_PTR_ALLOC_DEALLOC | ||
#define WORKAROUND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pragma message to highlight that there is a workaround
@@ -1,5 +1,5 @@ | |||
|
|||
// Copyright (c) 2015-23 Tom Deakin, Simon McIntosh-Smith, and Tom Lin | |||
// Copyright (c) 2015-16 Tom Deakin, Simon McIntosh-Smith, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this change
Cleanup CUDA
Add Serial
By @tom91136 Good thing to have when comparing with other parallel programming models, mostly for syntax.
This also makes us consistent with CloverLeaf, TeaLeaf, and miniBUDE.
Reuse Memory
This PR puts benchmarks in control of allocating the host
memory used for verifying the results.
This enables benchmarks that use Unified Memory for the device
allocations, to avoid the host-side allocation and just pass
pointers to the device allocation to the benchmark driver.
Closes #128 .
Cleanup C++ Standard Parallelism
Merge the 3 implementations into one with different flags for data c++17, data c++23, and indices.
Also annotate workarounds with a
#define WORKAROUND
and print whether the current implementation is not conforming.Adds support for AdaptiveCpp (CI not added yet; will be done later as part of removing hipSYCL).