Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We don't handle well a few scenarios involving vfork. The reason is that dbgapi can't attach to a vfork child when it is still attached to the parent. For example, with a simple program that just calls "vfork": $ ./gdb -q -nx --data-directory=data-directory vfork -ex "set follow-fork-mode child" -ex r Reading symbols from vfork... Starting program: /home/master/smarchi/build/binutils-gdb/gdb/vfork [Attaching after process 14346 vfork to child process 14354] [New inferior 2 (process 14354)] amd-dbgapi: fatal error: enable_debug failed (rc=-1) Backtrace: #0 0x00007f40337a2a14 amd::dbgapi::process_t::attach() in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:1504 #1 0x00007f40337a2be2 operator() in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:2142 #2 0x00007f40337a2e46 tracer_closure in /home/master/smarchi/src/ROCdbgapi/src/logging.h:538 #3 0x00007f40337a2e46 enter<amd::dbgapi::detail::parameter_t<amd_dbgapi_client_process_s*, (& amd::dbgapi::utils::string_literal<'c', 'l', 'i', 'e', 'n', 't', '_', 'p', 'r', 'o', 'c', 'e', 's', 's', '_', 'i', 'd'>::value), (amd::dbgapi::detail::parameter_kind_t)0>, amd::dbgapi::detail::parameter_t<amd_dbgapi_process_id_t*, (& amd::dbgapi::utils::string_literal<'p', 'r', 'o', 'c', 'e', 's', 's', '_', 'i', 'd'>::value), (amd::dbgapi::detail::parameter_kind_t)0>, amd_dbgapi_process_attach(amd_dbgapi_client_process_id_t, amd_dbgapi_process_id_t*)::<lambda()> > in /home/master/smarchi/src/ROCdbgapi/src/logging.h:585 #4 0x00007f40337a2e46 amd_dbgapi_process_attach in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:2154 #5 0x0000000000a63164 rocm_enable(inferior*) in /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:1700 #6 0x0000000000a63553 rocm_target_ops::follow_fork(inferior*, ptid_t, target_waitkind, bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:2139 #7 0x0000000000b411d4 target_follow_fork(inferior*, ptid_t, target_waitkind, bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/target.c:2756 #8 0x0000000000865137 follow_fork_inferior(bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:588 #9 0x0000000000856c10 follow_fork() in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:760 #10 0x000000000085e0e2 handle_inferior_event(execution_control_state*) in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:5549 #11 0x000000000085c685 fetch_inferior_event() in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:4077 #12 0x000000000083b1d3 inferior_event_handler(inferior_event_type) in /home/master/smarchi/src/binutils-gdb/gdb/inf-loop.c:41 #13 0x00000000008ab957 handle_target_event(int, void*) in /home/master/smarchi/src/binutils-gdb/gdb/linux-nat.c:4208 #14 0x0000000000dccc02 handle_file_event(file_handler*, int) in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:575 #15 0x0000000000dcbaeb gdb_wait_for_event(int) in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:701 #16 0x0000000000dcb57c gdb_do_one_event() in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:212 #17 0x0000000000b71b7c wait_sync_command_done() in /home/master/smarchi/src/binutils-gdb/gdb/top.c:526 #18 0x0000000000b71c3d maybe_wait_sync_command_done(int) in /home/master/smarchi/src/binutils-gdb/gdb/top.c:543 #19 0x0000000000b72431 execute_command(char const*, int) in /home/master/smarchi/src/binutils-gdb/gdb/top.c:674 #20 0x00000000008e10da catch_command_errors(void (*)(char const*, int), char const*, int, bool) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:523 #21 0x00000000008e1274 execute_cmdargs(std::vector<cmdarg, std::allocator<cmdarg> > const*, cmdarg_kind, cmdarg_kind, int*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:618 #22 0x00000000008e0d65 captured_main_1(captured_main_args*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1317 #23 0x00000000008de70c captured_main(void*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1338 #24 0x00000000008de664 gdb_main(captured_main_args*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1363 #25 0x0000000000427406 main in /home/master/smarchi/src/binutils-gdb/gdb/gdb.c:32 #26 0x00007f40331f80b2 __libc_start_main #27 0x00000000004272ed _start #28 0xffffffffffffffff Could not attach to process 14354 (rc=-2) - With "detach-on-fork on" and "follow-fork-mode child", we try to attach the child before detaching the parent, which fails. This might be a consequence of the recent follow_fork changes: before that, GDB used to detach from the parent before attaching the child. - With "detach-on-fork on" and "follow-fork-mode parent", there's not problem as we never try to attach to the child. - With "detach-on-fork off" and "follow-fork-mode child", we try to attach the child while staying attached to the parent, so that fails. - With "detach-on-fork off" and "follow-fork-mode parent" (and "set schedule-multiple on", to get around the message that GDB prints), same thing. To avoid these vfork-related problems, I propose to not enable / push the rocm target for vfork children. A vfork child is only allowed to do a limited set of thing, presumably to avoid messing up its parent's address space. As documented in vfork(2): (From POSIX.1) The vfork() function has the same effect as fork(2), except that the be‐ havior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before suc‐ cessfully calling _exit(2) or one of the exec(3) family of functions. Clearly, a vfork child won't run GPU programs, as that would require doing things that are not allowed at that time. So we won't miss anything by not attaching dbgapi to the fork child. If the program execs, though, then it might run some GPU programs in the new address space, so we need to make sure to push the rocm target and attach dbgapi when we catch the exec. The following modifications are made: - in rocm_enable, don't push / attach if inf->vfork_parent is set. That is useful in the case of "detach-on-fork off" - in rocm_target_ops::follow_fork, don't call rocm_enable if the fork kind is vfork. This change might seem redundant with the previous one, but both are needed to cover all cases. - A vfork child will not have the rocm target pushed, so if it subsequently execs, rocm_target_ops::follow_exec won't be called. To catch that event and push the rocm target at that point, implement the inferior_execd observer. With just these changes, we hit this internal error: (gdb) PASS: gdb.base/foll-vfork.exp: exec: vfork child follow, finish after tcatch vfork: continue to vfork finish^M Run till exit from #0 0x00007ffff7eb325c in vfork () from /lib/x86_64-linux-gnu/libc.so.6^M [Attaching after process 7816 vfork to child process 7824]^M [New inferior 2 (process 7824)]^M /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:559: internal-error: void async_event_handler_clear(): Assertion `rocm_async_event_handler != nullptr' failed.^M A problem internal to GDB has been detected,^M further debugging may prove unreliable.^M Quit this debugging session? (y or n) FAIL: gdb.base/foll-vfork.exp: exec: vfork child follow, finish after tcatch vfork: finish (GDB internal error) Because we now don't push the rocm target in vfork child inferiors, we can end up with two inferiors, with the following targets: - inferior 1, rocm target + linux nat target - inferior 2, linux nat target If we call target_async(true) while inferior 2 is the current inferior, the linux nat target (which is shared by the two inferiors) will enable its async mode, but the rocm target won't. If we then call target_wait with inferior 1 as the current inferior (so end up in rocm_target_ops::wait), the rocm target thinks it is async, when in reality it is not, so we end up calling async_event_handler_clear while rocm_async_event_handler is unset. I think that the problem is that we can have target stack that are partially async, with some targets that have received the order to become async and other targets that haven't. I don't think this is specific to this patch. We could have hit this problem if we had decided that the the ROCm target would be pushed after a specific library load, for example. To fix that, modify target_async such we call target_async on all the inferiors that have the current process target as their process target. This ensures that all targets in all target stacks that have the linux nat target in them (in the example above) get the memo that they should enable their async mode. Change-Id: I4d8102c65cc6b1663a46a49b5490c0f2c91f6279
- Loading branch information