-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the parallel systems we need to address and define a supporting platform model that works for GraphBLAS #14
Comments
I plan on starting a simple extension to address this, with a I will have methods as follows:
There could be a default global context that all user threads will share, say To use the context object:
In the future, if every GrB method had a descriptor, then perhaps that could be used instead of The context should be simple: (1) create and set a Context, (2) use it implicitly in all GrB calls, then (3) disengage and free it. Attaching a context to a descriptor seems awkward and a shoe-horn to me. So at least at first, I will not connect the I currently allow the user application to set the # of threads globally, or in a I'm very close to starting work on the topics above, so any feedback now will be very useful. What I work out might be useful in a future spec as well. I must have this One thing Tim M. didn't include in his list is how these resources will be used when aggressively exploiting non-blocking mode. I think that the context object will simplify this, but this is in the future (for me). Say there are lots of user threads, and I'm trying to keep track of a computational DAG of GrB calls that I have not yet computed. I traverse the DAG, looking for things to optimize, execute, and so on. Not trivial, but imagine if this traversal also has to be thread-safe, with many user threads modifying it at the same time. That's a nightmare; I would need a parallel asynchronous data structure and algorithms for that. Instead, I would insist that any optimization I do must be done by a user thread with its own context object, and I would state that any matrices modified by this user thread would go into a DAG within this specific context. If multiple user threads modified matrices within this context at the same time, results would be undefined. Then I don't have to create a parallel data structure, and it will be a lot easier. Disengaging a context would imply a block, where all pending computations done while engaged must now be finished. Enganging a context would start a new empty DAG of pending computations. I don't see the need to assign matrices / vectors / scalars to a particular context. I'm a long way from creating a DAG of GrB calls for doing kernel fusion, but I want to think ahead. With this context object, I can effectively treat any set of GrB calls as a single ordered list of calls, from a single user thread. Then I can rearrange them and fuse them more easily, with no worries about other user threads making changes while I'm analyzing the set of calls to GrB that I have so far. |
I got my draft GxB_Context object working and the results are great ... except that an older gcc compiler (v9.4.0, not very old) struggles with nested parallelism. Both gcc 12.2.0 and icx 2022 work great. See this discussion: GraphBLAS/graphblas-api-c#74 and the results I've posted in my tagged version of SuiteSparse:GraphBLAS (v8.0.0.draft1): https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/v8.0.0.draft.1/Demo/Program/context_demo.c Using this GxB_Context object to get nested parallelism is very easy. It would be harder to do the same thing with descriptors, even if all GrB methods and operations took a descriptor. Here's what the demo looks like, simplified a bit. The code builds nmat matrices with GrB_Matrix_build, from the same I,J,X (useless, of course, since each of the matrices constructed have the same content, but it's a simple test).
I can explain the notion of "engage" and "disengage" for the GxB_Context. Briefly, I keep a threadprivate object, called GB_CONTEXT_THREAD, here: The "engage" operation simply does GB_CONTEXT_THREAD = Context ; and "disengage" does GB_CONTEXT_THREAD = NULL ; The user cannot access this threadprivate variable. There is a built-in world context, GxB_CONTEXT_WORLD, that is user visible and always non-NULL. Setting its nthreads contrls the # of threads a GrB function does if GB_CONTEXT_THREAD is NULL. If GB_CONTEXT_THREAD is not NULL, then its settings are used inside GrB instead. SuiteSparse:GraphBLAS doesn't use nested parallelism itself, so all I have to do is use the nthreads setting from GxB_CONTEXT_WORLD or GB_CONTEXT_THREAD to control all my parallel regions, which all have a num_threads(...) clause. |
The GxB_Context also contains information on which GPU to use, so this same code could be used to compute on multiple GPUs, very easily (once we have a set of GPU kernels for GrB_Matrix_build, of course). |
We need to document the parallel systems we must be able to support with the graphBLAS. This would include:
We need a platform model that appropriately abstracts systems composed of the above. It must deal with the complexity of the various memory spaces and support arbitrary, dynamic partitions of the above.
Finally, we need a way to deal with nonblocking GraphBLAS operations as part of a larger execution context that supports asynchronous execution. I will add a separate issue for this topic.
The text was updated successfully, but these errors were encountered: