-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: pybind11 2.10.0 - error_fetch_and_normalize changes conflicts with PyPy behavior for PyErr_NormalizeException #4075
Comments
Ping @rwgk |
Do you mean The main reason for the code quoted in the OP is to catch "errors in the error handling" as close to the root cause as possible. See the corresponding unit test: pybind11/tests/test_exceptions.py Line 311 in 59f03ee
Unfortunately there is no good way to know if the change of exception type is the result of a bug, or working as intended. My opinion: changing the exception type during normalization is evil. If something is counting on the exception to get normalized, it can do so straightaway. Otherwise, client code will have to accommodate two exception types for the same exception, depending on the control flow. I feel that's like pulling out the rug from underneath people's feet. — Another line of thought: deferred normalization was meant to be an optimization. An optimization is an optimization only if the rest of the logic/control flow works identically with and without. From the perspective of someone wanting to keep a large-scale environment healthy, I'm very reluctant to remove the quoted code. It is far more likely to catch bugs than to block something good and valid that could not easily be achieved differently. I believe, in this particular case, what's in the interest of keeping a large-scale environment healthy, is also in the interest of the world at large. The quoted code helps catching bugs closer to the root cause, saving a lot of time debugging. |
I read your comment and examined the code for PyErr_NormalizeException in both CPython and PyPy. In CPython, PyErr_NormalizeException permits an exception to change its class when normalizing. Nothing about it is treated as an error or warning; it's permitted. After all, any class that implements I agree it's anti-pattern to change the exception type, but I don't think pybind11 is the right tool to be enforcing this rule, especially by treating as a fatal error. It happens to be, as far as I can tell, an implementation detail that PyPy uses NormalizeException to transform OSError into its specific subclasses - CPython doesn't for that code path, but there may be other places where the standard library does similar things. It happens to be the first case that cropped up, but there's a lot of third party Python and C extension libraries that may depend on NormalizeException working the way it normally does in CPython, and will be tripped up if it happens to pass through the pybind11 exception handler. I think we can also agree that it's better to avoid #ifdef PYPY in pybind11 when it's avoidable, and I think there may be a way:
|
Maybe there is a misunderstanding, or another bug in your environment (PyPy)? — It's not fatal, but a pybind11/tests/test_exceptions.py Line 308 in 59f03ee
Oh! pybind11/tests/test_exceptions.py Line 306 in 59f03ee
Of course, it's right there! Note that I ran Google-global testing, exercising 800+ pybind11 extensions many (maybe most, idk) 3rd party, usually that is very conclusive. We've been using PR #1985 Google-internally since June 2 and there we no complaints at all. Let's wait a couple more weeks to see how many more complaints we get. If there is too much pushback, we could make this opt-in vs always enabled. At the moment this bug is the only complaint I know about.
Sounds good. Do you want to try?
I believe they didn't think it through. They wanted to optimize, but forgot to handle the distinction between "accident/bug" and "intent". While working on #1895 I searched the documentation for |
… segfault with unknown root cause. Change prompted by pybind#4075
FYI: I'm testing #4079 , but it's late here. I'll try to finish the PR asap and will explain then. |
No, false! The claim that we are treating a change in exception type "as a fatal error" completely got me on the wrong track. It turns out to be a clean https://github.com/pybind/pybind11/runs/7385124173 You could adjust to that by catching If PyPy is then changed to normalize the error immediately, instead of relying on pybind11 to do that for PyPy, the world at large will be in a better state than it was before. Working on this bug reminded me that this pybind11/tests/test_exceptions.py Line 306 in 59f03ee
was needed even before #1895 was merged. Fixing the root cause for that segfault would definitely be a good thing. As-is, PyPy appears to mask errors in the error handling with a segfault. |
Apologies about the "fatal error" comment - I mistook pybind_fail for meaning that the interpreter is terminated. For pybind11 2.10 replaces an OSError from PyPy with an RuntimeError - I can't agree this is a clean solution. It replaces with the expected exception, an OSError, with a RuntimeError. That is an API breaking change and a regression from 2.9. I don't think it's reasonable to ask client code to introduce special, hacky, PyPy-specifc exception handling everywhere until PyPy addresses hte underlying issue. I agree with it does make sense to break client code when it's clearly doing something wrong, dangerous and most importantly fixable by client code, but that's not the case here. The code that's presumably broken is in PyPy (or perhaps cpyext). That segfault is also worrying. But client code can't fix the fundamental issues since those issues aren't in client code. I don't think it makes sense to punish client code over the issue. Everyone who supports PyPy also supports CPython, so any code is raising denormalized polymorphic exceptions it will get flagged by the MISMATCH test on the CPython side. That will prove safety and allow PyPy to work. I think that leaves two options:
|
@rwgk Could this segfault be related to the fact PyErr_NormalizeException decrefs the last two arguments, and we don't inrec them? See this comment: #1895 (comment) . It also demonstrates the very small perf penalty of normalizing the eagerly if that would solve the issue. |
Hm... I think we have been calling
There was a lot of back and forth on this (even just in my own thinking, but also between reviewers), but in the end we settled on immediate normalization: pybind11/include/pybind11/pytypes.h Lines 425 to 430 in ef7d971
|
Yes. I'll do the
No! :-) My interest: guard against masking errors in the error handling, because I know from direct experience how bad that can be ("extremely expensive to debug"). But I don't want to break anything that doesn't share that interest, if I can help it, which I believe the
I never meant to suggest it is a clean solution, I know it is not. The root of the problem is the missing intent vs bug distinction in
Of course that's no good for someone merely wanting to use pybind11 & PyPy in combination, but not wanting to get into PyPy to add the immediate normalization there. I think the |
…#4079) * For PyPy only, re-enable old behavior (likely to mask bugs), to avoid segfault with unknown root cause. Change prompted by #4075 * Undo the change in tests/test_exceptions.py I turns out (I forgot) that PyPy segfaults in `test_flaky_exception_failure_point_init` already before the `MISMATCH` code path is reached: https://github.com/pybind/pybind11/runs/7383663596 ``` RPython traceback: test_exceptions.py .......X.........Error in cpyext, CPython compatibility layer: File "pypy_module_cpyext.c", line 14052, in wrapper_second_level__star_3_1 File "pypy_module_cpyext_1.c", line 35750, in not_supposed_to_fail Fatal Python error: Segmentation fault Stack (most recent call first, approximate line numbers): File "/home/runner/work/pybind11/pybind11/tests/test_exceptions.py", line 306 in test_flaky_exception_failure_point_init The function PyErr_NormalizeException was not supposed to fail File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/python.py", line 185 in pytest_pyfunc_call File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_callers.py", line 9 in _multicall File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_manager.py", line 77 in _hookexec File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_hooks.py", line 244 in __call__ File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/python.py", line 1716 in runtest File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 159 in pytest_runtest_call File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_callers.py", line 9 in _multicall File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_manager.py", line 77 in _hookexec Fatal error in cpyext, CPython compatibility layer, calling PyErr_NormalizeException Either report a bug or consider not using this particular extension <SystemError object at 0x7fcc8cea6868> File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_hooks.py", line 244 in __call__ File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 261 in <lambda> File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 317 in from_call File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 246 in call_runtest_hook File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 218 in call_and_report File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 118 in runtestprotocol File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/runner.py", line 110 in pytest_runtest_protocol File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_callers.py", line 9 in _multicall File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_manager.py", line 77 in _hookexec File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_hooks.py", line 244 in __call__ File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/main.py", line 335 in pytest_runtestloop File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_callers.py", line 9 in _multicall File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_manager.py", line 77 in _hookexec File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_hooks.py", line 244 in __call__ File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/main.py", line 318 in _main File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/main.py", line 255 in wrap_session File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/main.py", line 314 in pytest_cmdline_main File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_callers.py", line 9 in _multicall File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_manager.py", line 77 in _hookexec File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pluggy/_hooks.py", line 244 in __call__ File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/config/__init__.py", line 133 in main File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/_pytest/config/__init__.py", line 181 in console_main File "/opt/hostedtoolcache/PyPy/3.7.13/x64/site-packages/pytest/__main__.py", line 1 in <module> File "/opt/hostedtoolcache/PyPy/3.7.13/x64/lib-python/3/runpy.py", line 62 in _run_code File "/opt/hostedtoolcache/PyPy/3.7.13/x64/lib-python/3/runpy.py", line 170 in _run_module_as_main File "<builtin>/app_main.py", line 109 in run_toplevel File "<builtin>/app_main.py", line 652 in run_command_line File "<builtin>/app_main.py", line 996 in entry_point Segmentation fault (core dumped) ``` * Add test_pypy_oserror_normalization * Disable new `PYPY_VERSION` `#if`, to verify that the new test actually fails. * Restore PYPY_VERSION workaround and update comment to reflect what was learned. * [ci skip] Fix trivial oversight in comment.
Sorry for delay in respond. I'll submit a quick PR to clarify the comments many types of OSError may do this, not just FileNotFoundError. |
Reported to PyPy over here. |
Hi. I am really confused. It seems there are at least two issues here:
Do I understand correctly up to here? If so read on. If not please correct me. The first one is a PyPy bug, and PyPy (i.e. me) should fix it. Do you know what C-API call is being used? The second one seems to be that PyPy's implementation of |
I'm not too sure about the exact internal steps, too, but yes, at the moment when
The
It was "always" that way, but PR #1895 changed the behavior of pybind11, for safety. It now does this: pybind11/include/pybind11/pytypes.h Lines 483 to 485 in 9a29637
Instead of the
I never looked deeper than what I described above.
I don't think that ever was a problem for pybind11, before and after PR #1895. From my limited understanding / biased viewpoint: PyPy seems to "exploit" A related but separate PyPy problem that existed before and still exists after PR #1895: https://github.com/pybind/pybind11/blob/master/tests/test_exceptions.py#L306 It would be great for PyPy users if that got fixed. As-is, errors in the error handling (the unit test) lead to a PyPy segfault. I'm guessing that could be very time consuming to debug, especially if the error condition is difficult to reproduce, which I've seen quite often for production jobs (as a general observation, I don't think Google is using PyPy in production). |
PyPy now emits the proper normalized error in nightlies, and this will be part of the 7.3.10 release. Thanks for pointing it out. |
I added a PyPy-nightly run to the binary-run repo where I test c-extension packages. ubuntu and windows are passing, macos is failing this test. I wonder what is different about macOS. Using clang instead of gcc/MSVC maybe? |
Yes, most likely: I've only ever seen clang being used when testing pybind11 on macOS. Maybe related: Apple's versioning doesn't seem to match "normal" clang versioning. (It's one of those things I've been confused about but never got a chance to drill down.) |
@rwgk Apple's version = Clang version - 2, usually. |
Interesting ... thanks! |
Required prerequisites
Problem description
PyPy 3.8 triggers the newly introduced error message
pybind11/include/pybind11/pytypes.h
Lines 459 to 469 in 59f03ee
In PyPy,
PyErr_NormalizeException
causes the "raw" OSError to be normalized into FileNotFoundError. Which seems fair enough.It's unclear to me whether PyPy is doing something "naughty" here that violates the expectations set by CPython, or if this new exception handling code is just imposing an expectation that doesn't hold for PyPy. As such I don't know if simply disabling that test is appropriate for PyPy.
CPython does not make the same substitution. There is no error (and the code behaves correctly) with pybind11 2.9.
Reproducible example code
The text was updated successfully, but these errors were encountered: