You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been debugging a crash in our project that was occurring because of a GIL error. The code does something similar to the following:
#include<pybind11/pybind11.h>
#include<pybind11/embed.h>
#include<optional>namespacepy= pybind11;
/// RAII utility type that guarantees that the GIL is unlocked for the scope of/// the lifetime of the object.////// This type is re-entrant.////// postcondition: `GILRelease rel;` establishes `PyGILState_Check() == 0`structGILRelease
{
inlineGILRelease()
{
if (PyGILState_Check() == 1)
_release.emplace();
assert(PyGILState_Check() == 0);
}
GILRelease(const GILRelease&) = delete;
GILRelease& operator=(const GILRelease&) = delete;
inline~GILRelease()
{
if (_release && _Py_IsFinalizing() != 0)
_release->disarm();
}
private:
std::optional<py::gil_scoped_release> _release;
};
/// Deleter that deletes the pointer outside the GIL.////// Useful for types that might deadlock on destruction if they keep the GIL/// locked.structDeleteOutsideGIL
{
template<typename T>
voidoperator()(T* ptr) const
{
GILRelease unlock;
delete ptr;
}
};
PYBIND11_EMBEDDED_MODULE(test, m) {
structCookies {};
py::class_<Cookies, std::unique_ptr<Cookies, DeleteOutsideGIL>>(m, "Cookies")
.def(py::init([]{
return std::unique_ptr<Cookies, DeleteOutsideGIL>(new Cookies{}, DeleteOutsideGIL{});
}));
}
intmain(int argc, char **argv)
{
py::scoped_interpreter interp;
py::globals()["test"] = py::module::import("test");
py::exec("cookies = test.Cookies()");
return0;
}
So this declares a simple test module with an empty class, that is constructed inside unique_ptr with a custom deleter DeleteOutsideGIL that ensures the destructor of the C++ type is called outside of the GIL (this is because destructors in our project can run potentially long/blocking operations).
This custom deleter makes use of a custom type GILRelease that is a wrapper on top of pybind11::gil_scoped_release, in an attempt to both:
make sure it is reentrant, since gil_scoped_release is not. Especially the following code:
may lead to crashes, at least on CPython. This is unfortunate and is unacceptable to us.
make sure to disarm the gil_scoped_release if the interpreter is finalizing in the destructor, since it is specified that gil_scoped_release destructor may crash if it is the case.
Unfortunately, the example code above crashes with CPython 3.8. The crash is caused by the fact that when our custom deleter is called, the Python interpreter is finalizing (it is garbage collecting our cookies object), so the GILRelease constructor releases the GIL, but the destructor does not acquire it back. When the deleter returns, the GIL is still unlocked and calls to memory allocation then fail in the interpreter, with a callstack similar to the following:
Program received signal SIGSEGV, Segmentation fault.
#0 0x00007ffff7c06e1b in _PyErr_Restore (tstate=0x0, type=0x0, value=0x0, traceback=0x0) at ../Python/errors.c:53
#1 0x000055555555ecfa in pybind11::error_scope::~error_scope() ()
#2 0x000055555555c16f in pybind11::class_<pybind11_init_test(pybind11::module_&)::Cookies, std::unique_ptr<pybind11_init_test(pybind11::module_&)::Cookies, DeleteOutsideGIL> >::dealloc(pybind11::detail::value_and_holder&) ()
#3 0x0000555555567383 in pybind11::detail::clear_instance(_object*) ()
#4 0x0000555555567485 in pybind11_object_dealloc ()
#5 0x00007ffff7cc4155 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>)
at ../Include/object.h:478
#6 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#7 free_keys_object (keys=0x7ffff6c503c0) at ../Objects/dictobject.c:584
#8 0x00007ffff7cc4808 in dictkeys_decref (dk=0x7ffff6c503c0) at ../Objects/dictobject.c:324
#9 dict_dealloc (mp=0x7ffff6bf86c0) at ../Objects/dictobject.c:1998
#10 0x00007ffff7cb5315 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>)
at ../Include/object.h:478
#11 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#12 module_dealloc (m=0x7ffff6c22e50) at ../Objects/moduleobject.c:690
#13 0x00007ffff7cc6ab5 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>)
at ../Objects/dictobject.c:1542
#14 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#15 insertdict (value=0x5555555af620 <_Py_NoneStruct>, hash=-3408259751683703317, key=0x7ffff6bd2a30,
mp=0x7ffff6c2cd00) at ../Objects/dictobject.c:1102
#16 PyDict_SetItem (op=0x7ffff6c2cd00, key=0x7ffff6bd2a30, value=0x5555555af620 <_Py_NoneStruct>)
at ../Objects/dictobject.c:1545
#17 0x00007ffff7bfc07e in PyImport_Cleanup () at ../Python/import.c:492
#18 0x00007ffff7be7600 in Py_FinalizeEx () at ../Python/pylifecycle.c:1229
#19 Py_FinalizeEx () at ../Python/pylifecycle.c:1150
#20 0x000055555556eef2 in pybind11::finalize_interpreter() ()
#21 0x000055555556f018 in pybind11::scoped_interpreter::~scoped_interpreter() ()
#22 0x000055555555bc79 in main ()
Since then, I have become confused about the usage of disarm on the GIL RAII types. The documentation and tests about it are scarce. It seems like an error to "disarm" a GIL release because it seems to me it can lead to incoherent GIL states. At this point, I feel like the right thing to do would be to not release the GIL at all when the interpreter is finalizing, which would avoid the asymmetry that disarm causes.
Am I missing something ? what is your opinion on this ? is there something I'm doing wrong which causes me to have this issue ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I have been debugging a crash in our project that was occurring because of a GIL error. The code does something similar to the following:
So this declares a simple test module with an empty class, that is constructed inside
unique_ptr
with a custom deleterDeleteOutsideGIL
that ensures the destructor of the C++ type is called outside of the GIL (this is because destructors in our project can run potentially long/blocking operations).This custom deleter makes use of a custom type
GILRelease
that is a wrapper on top ofpybind11::gil_scoped_release
, in an attempt to both:gil_scoped_release
is not. Especially the following code:py::gil_scoped_release rel1; { py::gil_scoped_release rel2; // ... }
may lead to crashes, at least on CPython. This is unfortunate and is unacceptable to us.
gil_scoped_release
if the interpreter is finalizing in the destructor, since it is specified thatgil_scoped_release
destructor may crash if it is the case.Unfortunately, the example code above crashes with CPython 3.8. The crash is caused by the fact that when our custom deleter is called, the Python interpreter is finalizing (it is garbage collecting our
cookies
object), so theGILRelease
constructor releases the GIL, but the destructor does not acquire it back. When the deleter returns, the GIL is still unlocked and calls to memory allocation then fail in the interpreter, with a callstack similar to the following:Since then, I have become confused about the usage of
disarm
on the GIL RAII types. The documentation and tests about it are scarce. It seems like an error to "disarm" a GIL release because it seems to me it can lead to incoherent GIL states. At this point, I feel like the right thing to do would be to not release the GIL at all when the interpreter is finalizing, which would avoid the asymmetry thatdisarm
causes.Am I missing something ? what is your opinion on this ? is there something I'm doing wrong which causes me to have this issue ?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions