-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a mutex to lock heap memory access #94
Conversation
This is also shows that if we put a mutex on a dtype that seems to not be a big deal in the single-threaded case. For ufuncs and casts we can acquire and release the mutex outside the core loop so it scales nicely too. I wonder if we could fix a ton of threading bugs in numpy by explicitly acquiring a per-dtype mutex before releasing the GIL in numpy. |
My understanding of the current version of PEP 703 is that yes, we can assume that. |
…ated with a newly allocated arena
I just added a test for multithreaded access and mutation and things seem to be working correctly. I also fixed two unrelated bugs I encountered while writing that test and fixed how I release the mutex - I need to account for cases when the input descriptors are identical objects and not release or acquire the mutex more than once in those cases. |
np.testing.assert_array_equal(arr, uarr) | ||
|
||
|
||
def test_threaded_access_and_mutation(dtype, random_string_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg I based this off a test in numpy you wrote. Thanks for the example! Do you think there's any other sorts of operations that would be good to add to this test or problems with this implementation? It definitely does catch threading errors - I noticed that the original version of this PR wasn't releasing mutexes correctly after adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this test was specifically designed for setitem/getitem, I think (maybe only setitem) with structured dtypes, because it is a hairy recursing monster.
I suppose it might be good to have a single test that runs a ufunc (reading) in one thread and assigns to the same array in another? To have one basic ufunc locking test? (of course you can also use a ufunc to assign to the same array with out=
additionally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I suppose it already runs a ufunc, so whatever :). (I half suspect I didn't write this, but took it/adjusted it from an issue :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adapted from the structured dtype test, not just a copy/paste, so it's not quite the same.
Thanks for giving it a look!
BTW, once this PR is merged the stringdtype prototype is functionally done and I'm ready to move to integrating it into numpy. I'll probably update the NEP first though. Maybe I'll also include a call for testing the prototype when I update the mailing list?
3b9c7cc
to
44d9943
Compare
Performance impact for single-threaded use seems minimal. Here's a test on
And with this PR:
This kind of tiny array is the worst case for a ufunc too, since we acquire and release the mutex outside the inner loop. |
not quite true, there are some cases where NumPy will opt to call the inner loop with few items, but many times. But... whatever, only alternative would be to use the auxdata to lock for the whole operation. And that might not even work easily since we may be using casts at the same time. |
This adds an
allocator_lock
field to theStringDTypeObject
struct. This is aPyThread_type_lock
mutex that must be acquired before manipulating the allocator.I defined
NPY_STRING_ACQUIRE_ALLOCATOR
(as well as variants that accept 2 and 3 descriptors) andNPY_STRING_RELEASE_ALLOCATOR
macros that handle locking and unlocking the mutex.There's nothing in the API that prevents someone from touching the
allocator
field before acquiring theallocator_lock
. Maybe it would be nicer to wrap a type around the allocator or a more idiomatic way to write this in C?Most of this PR is then applying these macros where appropriate, including in error handling cases. There are a few places where I didn't exit from error cases and other similar bugs are fixed as well.
I wasn't sure what to do in
stringdtype_dealloc
. Can we assume because of the GIL that a dtype is deallocated by only one thread?I haven't benchmarked the singlethreaded performance hit or added threaded tests but am planning to add that before merging.
I based this implementation off of this tutorial.