Skip to content
Abhisek edited this page Mar 7, 2016 · 4 revisions

This page lists the to-do items and issues in no particular order:

  • The GPU reduction is actually slower that CPU reduction even for large array sizes.

  • Properly implement the leader and follower functions for chpl__initOnLocales in ChapelLocale.chpl.

  • Add HSA runtime source-code in the third-party directory instead of binaries (even the binaries do not work right now - hsa version mismatch)

  • Add test cases

  • Determine how to identify that a sublocale has GPU support. Right now it is just based on the ID (0 = CPU, 1 = GPU)

  • Check that the array is a rectangular 1D array before invoking GPU reductions.

  • Make sure execution on the parent locale also goes to the CPU sublocale / Decide what happens when only the parent locale is specified.

  • Implement GPU reductions for all data-types and functions. Right now, only int32 / sum is implemented

  • How many queues should be created.

  • Decide on multi-gpu support.

  • Handling of asynchronous GPU kernel execution. Right now the execution on GPU is always synchronous.

  • Fix data allocation methods for the GPU sublocale.

Clone this wiki locally