-
Notifications
You must be signed in to change notification settings - Fork 1
ToDo
-
The GPU reduction is actually slower that CPU reduction even for large array sizes.
-
Properly implement the leader and follower functions for chpl__initOnLocales in ChapelLocale.chpl.
-
Add HSA runtime source-code in the third-party directory instead of binaries (even the binaries do not work right now - hsa version mismatch)
-
Add test cases
-
Determine how to identify that a sublocale has GPU support. Right now it is just based on the ID (0 = CPU, 1 = GPU)
-
Check that the array is a rectangular 1D array before invoking GPU reductions.
-
Make sure execution on the parent locale also goes to the CPU sublocale / Decide what happens when only the parent locale is specified.
-
Implement GPU reductions for all data-types and functions. Right now, only int32 / sum is implemented
-
How many queues should be created.
-
Decide on multi-gpu support.
-
Handling of asynchronous GPU kernel execution. Right now the execution on GPU is always synchronous.
-
Fix data allocation methods for the GPU sublocale.