Use a mutex to lock heap memory access #94

ngoldbaum · 2023-11-06T20:39:23Z

This adds an allocator_lock field to the StringDTypeObject struct. This is a PyThread_type_lock mutex that must be acquired before manipulating the allocator.

I defined NPY_STRING_ACQUIRE_ALLOCATOR (as well as variants that accept 2 and 3 descriptors) and NPY_STRING_RELEASE_ALLOCATOR macros that handle locking and unlocking the mutex.

There's nothing in the API that prevents someone from touching the allocator field before acquiring the allocator_lock. Maybe it would be nicer to wrap a type around the allocator or a more idiomatic way to write this in C?

Most of this PR is then applying these macros where appropriate, including in error handling cases. There are a few places where I didn't exit from error cases and other similar bugs are fixed as well.

I wasn't sure what to do in stringdtype_dealloc. Can we assume because of the GIL that a dtype is deallocated by only one thread?

I haven't benchmarked the singlethreaded performance hit or added threaded tests but am planning to add that before merging.

I based this implementation off of this tutorial.

ngoldbaum · 2023-11-06T20:41:46Z

This is also shows that if we put a mutex on a dtype that seems to not be a big deal in the single-threaded case. For ufuncs and casts we can acquire and release the mutex outside the core loop so it scales nicely too. I wonder if we could fix a ton of threading bugs in numpy by explicitly acquiring a per-dtype mutex before releasing the GIL in numpy.

lysnikolaou · 2023-11-07T12:35:06Z

Can we assume because of the GIL that a dtype is deallocated by only one thread?

My understanding of the current version of PEP 703 is that yes, we can assume that.

…ated with a newly allocated arena

ngoldbaum · 2023-11-09T16:29:23Z

I just added a test for multithreaded access and mutation and things seem to be working correctly. I also fixed two unrelated bugs I encountered while writing that test and fixed how I release the mutex - I need to account for cases when the input descriptors are identical objects and not release or acquire the mutex more than once in those cases.

ngoldbaum · 2023-11-09T16:32:06Z

stringdtype/tests/test_stringdtype.py

+    np.testing.assert_array_equal(arr, uarr)
+
+
+def test_threaded_access_and_mutation(dtype, random_string_list):


@seberg I based this off a test in numpy you wrote. Thanks for the example! Do you think there's any other sorts of operations that would be good to add to this test or problems with this implementation? It definitely does catch threading errors - I noticed that the original version of this PR wasn't releasing mutexes correctly after adding it.

IIRC this test was specifically designed for setitem/getitem, I think (maybe only setitem) with structured dtypes, because it is a hairy recursing monster.

I suppose it might be good to have a single test that runs a ufunc (reading) in one thread and assigns to the same array in another? To have one basic ufunc locking test? (of course you can also use a ufunc to assign to the same array with out= additionally)

Hmmm, I suppose it already runs a ufunc, so whatever :). (I half suspect I didn't write this, but took it/adjusted it from an issue :))

This is adapted from the structured dtype test, not just a copy/paste, so it's not quite the same.

Thanks for giving it a look!

BTW, once this PR is merged the stringdtype prototype is functionally done and I'm ready to move to integrating it into numpy. I'll probably update the NEP first though. Maybe I'll also include a call for testing the prototype when I update the mailing list?

ngoldbaum · 2023-11-09T20:24:10Z

Performance impact for single-threaded use seems minimal. Here's a test on main with numpy and stringdtype compiled in release mode:

In [3]: arr = np.array(["hello", "world"], dtype=StringDType())

In [4]: %timeit np.add(arr, arr)
1.65 µs ± 33.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

And with this PR:

In [4]: arr = np.array(["hello", "world"], dtype=StringDType())

In [5]: %timeit np.add(arr, arr)
1.78 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

This kind of tiny array is the worst case for a ufunc too, since we acquire and release the mutex outside the inner loop.

seberg · 2023-11-10T15:02:57Z

kind of tiny array is the worst case for a ufunc

not quite true, there are some cases where NumPy will opt to call the inner loop with few items, but many times. where= is the worst, the other is reductions.

But... whatever, only alternative would be to use the auxdata to lock for the whole operation. And that might not even work easily since we may be using casts at the same time.

ngoldbaum added 2 commits November 6, 2023 13:29

skip find and rfind tests for now

4597f51

add an allocator mutex

30675e6

ngoldbaum added 2 commits November 8, 2023 13:26

fix bug when growing a medium string and when freeing a string associ…

f07889f

…ated with a newly allocated arena

account for in-place operations in add and multiply

f5c800e

ngoldbaum commented Nov 9, 2023

View reviewed changes

Fix incorrect usage of mutex with identical dtypes

44d9943

ngoldbaum force-pushed the alloc-mutex branch from 3b9c7cc to 44d9943 Compare November 9, 2023 16:44

ngoldbaum merged commit da64667 into numpy:main Nov 10, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a mutex to lock heap memory access #94

Use a mutex to lock heap memory access #94

ngoldbaum commented Nov 6, 2023

ngoldbaum commented Nov 6, 2023

lysnikolaou commented Nov 7, 2023

ngoldbaum commented Nov 9, 2023

ngoldbaum Nov 9, 2023

seberg Nov 9, 2023

seberg Nov 9, 2023

ngoldbaum Nov 9, 2023 •

edited

Loading

ngoldbaum commented Nov 9, 2023

seberg commented Nov 10, 2023

		np.testing.assert_array_equal(arr, uarr)


		def test_threaded_access_and_mutation(dtype, random_string_list):

Use a mutex to lock heap memory access #94

Use a mutex to lock heap memory access #94

Conversation

ngoldbaum commented Nov 6, 2023

ngoldbaum commented Nov 6, 2023

lysnikolaou commented Nov 7, 2023

ngoldbaum commented Nov 9, 2023

ngoldbaum Nov 9, 2023

Choose a reason for hiding this comment

seberg Nov 9, 2023

Choose a reason for hiding this comment

seberg Nov 9, 2023

Choose a reason for hiding this comment

ngoldbaum Nov 9, 2023 • edited Loading

Choose a reason for hiding this comment

ngoldbaum commented Nov 9, 2023

seberg commented Nov 10, 2023

ngoldbaum Nov 9, 2023 •

edited

Loading