Skip to content

Commit

Permalink
gdb/rocm: fix vfork handling
Browse files Browse the repository at this point in the history
We don't handle well a few scenarios involving vfork.  The reason is
that dbgapi can't attach to a vfork child when it is still attached to
the parent.

For example, with a simple program that just calls "vfork":

    $ ./gdb -q -nx --data-directory=data-directory vfork -ex "set follow-fork-mode child" -ex r
    Reading symbols from vfork...
    Starting program: /home/master/smarchi/build/binutils-gdb/gdb/vfork
    [Attaching after process 14346 vfork to child process 14354]
    [New inferior 2 (process 14354)]
    amd-dbgapi: fatal error: enable_debug failed (rc=-1)
    Backtrace:
        #0 0x00007f40337a2a14 amd::dbgapi::process_t::attach() in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:1504
        #1 0x00007f40337a2be2 operator() in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:2142
        #2 0x00007f40337a2e46 tracer_closure in /home/master/smarchi/src/ROCdbgapi/src/logging.h:538
        #3 0x00007f40337a2e46 enter<amd::dbgapi::detail::parameter_t<amd_dbgapi_client_process_s*, (& amd::dbgapi::utils::string_literal<'c', 'l', 'i', 'e', 'n', 't', '_', 'p', 'r', 'o', 'c', 'e', 's', 's', '_', 'i', 'd'>::value), (amd::dbgapi::detail::parameter_kind_t)0>, amd::dbgapi::detail::parameter_t<amd_dbgapi_process_id_t*, (& amd::dbgapi::utils::string_literal<'p', 'r', 'o', 'c', 'e', 's', 's', '_', 'i', 'd'>::value), (amd::dbgapi::detail::parameter_kind_t)0>, amd_dbgapi_process_attach(amd_dbgapi_client_process_id_t, amd_dbgapi_process_id_t*)::<lambda()> > in /home/master/smarchi/src/ROCdbgapi/src/logging.h:585
        #4 0x00007f40337a2e46 amd_dbgapi_process_attach in /home/master/smarchi/src/ROCdbgapi/src/process.cpp:2154
        #5 0x0000000000a63164 rocm_enable(inferior*) in /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:1700
        #6 0x0000000000a63553 rocm_target_ops::follow_fork(inferior*, ptid_t, target_waitkind, bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:2139
        #7 0x0000000000b411d4 target_follow_fork(inferior*, ptid_t, target_waitkind, bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/target.c:2756
        #8 0x0000000000865137 follow_fork_inferior(bool, bool) in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:588
        #9 0x0000000000856c10 follow_fork() in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:760
        #10 0x000000000085e0e2 handle_inferior_event(execution_control_state*) in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:5549
        #11 0x000000000085c685 fetch_inferior_event() in /home/master/smarchi/src/binutils-gdb/gdb/infrun.c:4077
        #12 0x000000000083b1d3 inferior_event_handler(inferior_event_type) in /home/master/smarchi/src/binutils-gdb/gdb/inf-loop.c:41
        #13 0x00000000008ab957 handle_target_event(int, void*) in /home/master/smarchi/src/binutils-gdb/gdb/linux-nat.c:4208
        #14 0x0000000000dccc02 handle_file_event(file_handler*, int) in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:575
        #15 0x0000000000dcbaeb gdb_wait_for_event(int) in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:701
        #16 0x0000000000dcb57c gdb_do_one_event() in /home/master/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:212
        #17 0x0000000000b71b7c wait_sync_command_done() in /home/master/smarchi/src/binutils-gdb/gdb/top.c:526
        #18 0x0000000000b71c3d maybe_wait_sync_command_done(int) in /home/master/smarchi/src/binutils-gdb/gdb/top.c:543
        #19 0x0000000000b72431 execute_command(char const*, int) in /home/master/smarchi/src/binutils-gdb/gdb/top.c:674
        #20 0x00000000008e10da catch_command_errors(void (*)(char const*, int), char const*, int, bool) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:523
        #21 0x00000000008e1274 execute_cmdargs(std::vector<cmdarg, std::allocator<cmdarg> > const*, cmdarg_kind, cmdarg_kind, int*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:618
        #22 0x00000000008e0d65 captured_main_1(captured_main_args*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1317
        #23 0x00000000008de70c captured_main(void*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1338
        #24 0x00000000008de664 gdb_main(captured_main_args*) in /home/master/smarchi/src/binutils-gdb/gdb/main.c:1363
        #25 0x0000000000427406 main in /home/master/smarchi/src/binutils-gdb/gdb/gdb.c:32
        #26 0x00007f40331f80b2 __libc_start_main
        #27 0x00000000004272ed _start
        #28 0xffffffffffffffff
    Could not attach to process 14354 (rc=-2)

 - With "detach-on-fork on" and "follow-fork-mode child", we
   try to attach the child before detaching the parent, which fails.  This
   might be a consequence of the recent follow_fork changes: before that,
   GDB used to detach from the parent before attaching the child.

 - With "detach-on-fork on" and "follow-fork-mode parent", there's not
   problem as we never try to attach to the child.

 - With "detach-on-fork off" and "follow-fork-mode child", we try to attach
   the child while staying attached to the parent, so that fails.

 - With "detach-on-fork off" and "follow-fork-mode parent" (and "set
   schedule-multiple on", to get around the message that GDB prints), same
   thing.

To avoid these vfork-related problems, I propose to not enable / push
the rocm target for vfork children.  A vfork child is only allowed to do
a limited set of thing, presumably to avoid messing up its parent's
address space.  As documented in vfork(2):

    (From POSIX.1) The vfork() function has the same effect as fork(2), except that the be‐
    havior is undefined if the process created by vfork() either modifies  any  data  other
    than  a  variable of type pid_t used to store the return value from vfork(), or returns
    from the function in which vfork() was called, or calls any other function before  suc‐
    cessfully calling _exit(2) or one of the exec(3) family of functions.

Clearly, a vfork child won't run GPU programs, as that would require
doing things that are not allowed at that time.  So we won't miss
anything by not attaching dbgapi to the fork child.  If the program
execs, though, then it might run some GPU programs in the new address
space, so we need to make sure to push the rocm target and attach dbgapi
when we catch the exec.

The following modifications are made:

  - in rocm_enable, don't push / attach if inf->vfork_parent is set.
    That is useful in the case of "detach-on-fork off"
  - in rocm_target_ops::follow_fork, don't call rocm_enable if the fork
    kind is vfork.  This change might seem redundant with the previous
    one, but both are needed to cover all cases.
  - A vfork child will not have the rocm target pushed, so if it
    subsequently execs, rocm_target_ops::follow_exec won't be called.
    To catch that event and push the rocm target at that point,
    implement the inferior_execd observer.

With just these changes, we hit this internal error:

    (gdb) PASS: gdb.base/foll-vfork.exp: exec: vfork child follow, finish after tcatch vfork: continue to vfork
    finish^M
    Run till exit from #0  0x00007ffff7eb325c in vfork () from /lib/x86_64-linux-gnu/libc.so.6^M
    [Attaching after process 7816 vfork to child process 7824]^M
    [New inferior 2 (process 7824)]^M
    /home/master/smarchi/src/binutils-gdb/gdb/rocm-tdep.c:559: internal-error: void async_event_handler_clear(): Assertion `rocm_async_event_handler != nullptr' failed.^M
    A problem internal to GDB has been detected,^M
    further debugging may prove unreliable.^M
    Quit this debugging session? (y or n) FAIL: gdb.base/foll-vfork.exp: exec: vfork child follow, finish after tcatch vfork: finish (GDB internal error)

Because we now don't push the rocm target in vfork child inferiors, we
can end up with two inferiors, with the following targets:

 - inferior 1, rocm target + linux nat target
 - inferior 2, linux nat target

If we call target_async(true) while inferior 2 is the current inferior,
the linux nat target (which is shared by the two inferiors) will enable
its async mode, but the rocm target won't.  If we then call target_wait
with inferior 1 as the current inferior (so end up in
rocm_target_ops::wait), the rocm target thinks it is async, when in
reality it is not, so we end up calling async_event_handler_clear while
rocm_async_event_handler is unset.

I think that the problem is that we can have target stack that are
partially async, with some targets that have received the order to
become async and other targets that haven't.  I don't think this is
specific to this patch.  We could have hit this problem if we had
decided that the the ROCm target would be pushed after a specific
library load, for example.

To fix that, modify target_async such we call target_async on all the
inferiors that have the current process target as their process target.
This ensures that all targets in all target stacks that have the linux
nat target in them (in the example above) get the memo that they should
enable their async mode.

Change-Id: I4d8102c65cc6b1663a46a49b5490c0f2c91f6279
  • Loading branch information
Simon Marchi committed Aug 27, 2021
1 parent 827562a commit ff546ec
Show file tree
Hide file tree
Showing 5 changed files with 168 additions and 4 deletions.
27 changes: 24 additions & 3 deletions gdb/rocm-tdep.c
Original file line number Diff line number Diff line change
Expand Up @@ -1687,6 +1687,17 @@ rocm_enable (inferior *inf)
return;
}

/* dbgapi can't attach to a vfork child (a process born from a vfork that
hasn't exec'ed yet) while we are still attached to the parent. It would
not be useful for us to attach to vfork children anyway, because vfork
children are very restricted in what they can do (see vfork(2)) and aren't
going to launch some GPU programs that we need to debug. To avoid this
problem, we don't push the rocm target / attach dbgapi in vfork children.
If a vfork child execs, we'll try enabling the rocm target through the
inferior_execd observer. */
if (inf->vfork_parent != nullptr)
return;

auto *info = get_rocm_inferior_info (inf);

/* Are we already attached? */
Expand Down Expand Up @@ -2117,6 +2128,12 @@ rocm_target_ops::follow_exec (inferior *follow_inf, ptid_t ptid,
rocm_enable (follow_inf);
}

static void
rocm_inferior_execd (inferior *inf)
{
rocm_enable (inf);
}

void
rocm_target_ops::follow_fork (inferior *child_inf, ptid_t child_ptid,
target_waitkind fork_kind, bool follow_child,
Expand All @@ -2134,9 +2151,12 @@ rocm_target_ops::follow_fork (inferior *child_inf, ptid_t child_ptid,
child_info->precise_memory.requested
= parent_info->precise_memory.requested;

scoped_restore_current_thread restore_thread;
switch_to_thread (*child_inf->threads ().begin ());
rocm_enable (child_inf);
if (fork_kind != TARGET_WAITKIND_VFORKED)
{
scoped_restore_current_thread restore_thread;
switch_to_thread (*child_inf->threads ().begin ());
rocm_enable (child_inf);
}
}
}

Expand Down Expand Up @@ -3565,6 +3585,7 @@ _initialize_rocm_tdep ()
gdb::observers::signal_received.attach (rocm_target_signal_received,
"rocm-tdep");
gdb::observers::normal_stop.attach (rocm_target_normal_stop, "rocm-tdep");
gdb::observers::inferior_execd.attach (rocm_inferior_execd, "rocm-tdep");

create_internalvar_type_lazy ("_wave_id", &rocm_wave_id_funcs, NULL);

Expand Down
12 changes: 11 additions & 1 deletion gdb/target.c
Original file line number Diff line number Diff line change
Expand Up @@ -4378,7 +4378,17 @@ void
target_async (int enable)
{
infrun_async (enable);
current_inferior ()->top_target ()->async (enable);

process_stratum_target *proc_target = current_inferior ()->process_target ();
scoped_restore_current_thread restore_thread;

for (inferior *inf : all_inferiors (proc_target))
{
if (current_inferior () != inf)
switch_to_inferior_no_thread (inf);

inf->top_target ()->async (enable);
}
}

/* See target.h. */
Expand Down
30 changes: 30 additions & 0 deletions gdb/testsuite/gdb.rocm/fork-exec-execee.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/* Copyright (C) 2021 Advanced Micro Devices, Inc. All rights reserved.
This file is part of GDB.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */

#include <hip/hip_runtime.h>

__global__ void
kernel ()
{}

int
main (int argc, char* argv[])
{
hipLaunchKernelGGL (kernel, dim3 (1), dim3 (1), 0, 0);
hipDeviceSynchronize ();
return 0;
}
41 changes: 41 additions & 0 deletions gdb/testsuite/gdb.rocm/fork-exec-execer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/* Copyright (C) 2021 Advanced Micro Devices, Inc. All rights reserved.
This file is part of GDB.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */

#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int
main (int argc, char* argv[])
{
/* FORK is defined to fork or vfork by the test. */
int pid = FORK ();
if (pid != 0)
{
/* Parent. */
}
else
{
/* EXECEE is defined by the test. */
int ret = execl (EXECEE, EXECEE, NULL);
perror ("exec");
abort ();
}

return 0;
}
62 changes: 62 additions & 0 deletions gdb/testsuite/gdb.rocm/fork-exec.exp
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (C) 2021 Advanced Micro Devices, Inc. All rights reserved.

# This file is part of GDB.

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

# Verify that we can debug a GPU program in a child after a (v)fork + exec.

load_lib rocm.exp

standard_testfile -execer.cpp -execee.cpp

if [skip_hipcc_tests] {
verbose "Skipping hip test: ${testfile}."
return 0
}

set srcfile_execer "$srcfile"
set srcfile_execee "$srcfile2"
set binfile_execee "$binfile-execee"

# Compile two versions of execer, one that uses fork and one that uses vfork.
foreach_with_prefix fork_func { fork vfork } {
set opts [list additional_flags=-DFORK=$fork_func \
additional_flags=-DEXECEE="${::binfile_execee}"]
if {[build_executable "failed to prepare" ${::binfile}-execer-${fork_func} \
$srcfile_execer $opts]} {
return
}
}

if {[build_executable "failed to prepare" $binfile_execee $srcfile_execee \
{debug hip}]} {
return
}

proc do_test { detach_on_fork fork_func } {
with_rocm_gpu_lock {
clean_restart ${::binfile}-execer-${fork_func}

gdb_test_no_output "set detach-on-fork ${detach_on_fork}"
gdb_test_no_output "set follow-fork-mode child"
runto kernel allow-pending message
}
}

foreach_with_prefix detach-on-fork { on off } {
foreach_with_prefix fork_func { fork vfork } {
do_test ${detach-on-fork} $fork_func
}
}

0 comments on commit ff546ec

Please sign in to comment.