Skip to content

debugger

Mark Gates edited this page Jul 13, 2023 · 1 revision

[TOC]

Debugging C++ exception

I don't claim to be an expert using command line debuggers (gdb, lldb), but they are useful for finding where code hits exceptions or segfaults. debug-exception.cc is a simple test code that has functions foo1, foo2, foo3, foo4. Passing in 1 will throw in foo1, 2 will throw in foo2, etc. Just for fun it uses OpenMP so there are multiple threads, too, but that doesn't really change anything. This uses lldb on macOS; gdb has analogous functionality but different syntax. I added ### comments.

You may have to play around with the syntax. I figured this out from https://stackoverflow.com/questions/8122375/lldb-breakpoint-on-exceptions-equivalent-of-gdbs-catch-throw but there were several different syntaxes given that didn't work for me – maybe for different versions of lldb?

Although I also agree that we should be throwing exceptions that have more useful information in them. That CUDA code may have originated before we added the slate_cuda_call, but it should be updated.

test/c++> make debug-exception
g++ -Wall -pedantic -std=c++11 -fopenmp -c -o debug-exception.o debug-exception.cc
g++ -fopenmp -o debug-exception debug-exception.o


### Run ./debug-exception 3, which throws in foo3.
thyme test/c++> ./debug-exception 3
main( 3 )
foo4( 3 )
foo3( 3, tid 0 )
foo3( 3, tid 1 )
foo3( 3, tid 2 )
terminate called recursively
terminate called recursively
foo3( 3, tid 3 )
Abort


### Now run it in the debugger.
thyme test/c++> lldb ./debug-exception
(lldb) target create "./debug-exception"
Current executable set to '/Users/mgates/Documents/test/c++/debug-exception' (x86_64).


### Set breakpoint on throwing C++ exceptions.
(lldb) break set -n __cxa_throw
Breakpoint 1: 2 locations.


### Run ./debug-exception 0, which doesn't throw an exception.
(lldb) run 0
Process 91619 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64)
main( 0 )
foo4( 0 )
foo3( 0, tid 1 )
foo2( 0, tid 1 )
foo1( 0, tid 1 )
foo3( 0, tid 1 )
foo2( 0, tid 1 )
foo1( 0, tid 1 )
foo3( 0, tid 0 )
foo2( 0, tid 0 )
foo1( 0, tid 0 )
foo3( 0, tid 0 )
foo3( 0, tid 3 )
foo3( 0, tid 2 )
foo2( 0, tid 2 )
foo1( 0, tid 2 )
foo2( 0, tid 3 )
foo1( 0, tid 3 )
foo3( 0, tid 3 )
foo2( 0, tid 3 )
foo3( 0, tid 2 )
foo2( 0, tid 0 )
foo2( 0, tid 2 )
foo1( 0, tid 3 )
foo1( 0, tid 2 )
foo3( 0, tid 1 )
foo2( 0, tid 1 )
foo1( 0, tid 1 )
foo1( 0, tid 0 )
foo3( 0, tid 0 )
foo2( 0, tid 0 )
foo1( 0, tid 0 )
Process 91619 exited with status = 0 (0x00000000)


### Run ./debug-exception 2, which throws an exception in foo2.
(lldb) run 2
Process 91625 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64)
main( 2 )
foo4( 2 )
foo3( 2, tid 0 )
foo2( 2, tid 0 )
foo3( 2, tid 2 )
foo2( 2, tid 2 )
foo3( 2, tid 1 )
Process 91625 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
libstdc++.6.dylib`__cxa_throw:
->  0x100122260 <+0>: pushq  %r13
    0x100122262 <+2>: movq   %rdx, %r13
    0x100122265 <+5>: pushq  %r12
    0x100122267 <+7>: movq   %rsi, %r12
  thread #3, stop reason = breakpoint 1.1
    frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
libstdc++.6.dylib`__cxa_throw:
->  0x100122260 <+0>: pushq  %r13
    0x100122262 <+2>: movq   %rdx, %r13
    0x100122265 <+5>: pushq  %r12
    0x100122267 <+7>: movq   %rsi, %r12
Target 0: (debug-exception) stopped.


### Looking at the backtrace (bt), we see it was in foo2.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
    frame #1: 0x0000000100003ac2 debug-exception`foo2(int, int) + 104
    frame #2: 0x0000000100003b4f debug-exception`foo3(int, int) + 119
    frame #3: 0x0000000100003d0a debug-exception`foo4(int) (._omp_fn.0) + 100
    frame #4: 0x0000000100452bd2 libgomp.1.dylib`GOMP_parallel + 66
    frame #5: 0x0000000100003bd9 debug-exception`foo4(int) + 131
    frame #6: 0x0000000100003c3a debug-exception`main + 90
    frame #7: 0x00007fff6ce1bcc9 libdyld.dylib`start + 1


### Run ./debug-exception 3, which throws an exception in foo3.
(lldb) kill
Process 91625 exited with status = 9 (0x00000009)
(lldb) run 3
Process 91633 launched: '/Users/mgates/Documents/test/c++/debug-exception' (x86_64)
main( 3 )
foo4( 3 )
foo3( 3, tid 0 )
foo3( 3, tid 1 )
foo3( 3, tid 2 )
Process 91633 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
libstdc++.6.dylib`__cxa_throw:
->  0x100122260 <+0>: pushq  %r13
    0x100122262 <+2>: movq   %rdx, %r13
    0x100122265 <+5>: pushq  %r12
    0x100122267 <+7>: movq   %rsi, %r12
  thread #2, stop reason = breakpoint 1.1
    frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
libstdc++.6.dylib`__cxa_throw:
->  0x100122260 <+0>: pushq  %r13
    0x100122262 <+2>: movq   %rdx, %r13
    0x100122265 <+5>: pushq  %r12
    0x100122267 <+7>: movq   %rsi, %r12
Target 0: (debug-exception) stopped.


### Looking at the backtrace (bt), we see it was in foo3.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100122260 libstdc++.6.dylib`__cxa_throw
    frame #1: 0x0000000100003b40 debug-exception`foo3(int, int) + 104
    frame #2: 0x0000000100003d0a debug-exception`foo4(int) (._omp_fn.0) + 100
    frame #3: 0x0000000100452bd2 libgomp.1.dylib`GOMP_parallel + 66
    frame #4: 0x0000000100003bd9 debug-exception`foo4(int) + 131
    frame #5: 0x0000000100003c3a debug-exception`main + 90
    frame #6: 0x00007fff6ce1bcc9 libdyld.dylib`start + 1
(lldb) kill
Process 91633 exited with status = 9 (0x00000009)
(lldb) ^D

Debugging MPI


Recompile SLATE with debugging

It needs -g flag, and for at least test/test.o using -O0. Here's my make.inc file on leconte showing the additions:

slate> cat make.inc
CXX         = mpicxx
FC          = mpif90

####################  Added these lines  ####################
CXXFLAGS    = -g -Wno-unused-variable

# This is in SLATE's GNUmakefile since https://bitbucket.org/icl/slate-dev/pull-requests/137
# For `tester --debug` purposes, compile test.o with -O0 (after -O3).
test/test.o: CXXFLAGS += -O0
####################                     ####################

# BLAS can be mkl or openblas (or others on other systems). Choose one.
blas        = mkl
#blas       = openblas

# Intel MKL supports gfortran conventions and ifort conventions.
# Choose one to match mpif90 compiler.
blas_fortran  = gfortran
#blas_fortran = ifort

# Intel MKL supports Open MPI and Intel MPI.
# Choose one to match MPI library.
#mkl_blacs  = openmpi
mkl_blacs   = intelmpi

cuda_arch   = volta
gpu_backend = cuda

For instance, when I compile test.o, the command is:

    mpicxx -g -Wno-unused-variable -O3 -std=c++17 \
        -Wall -Wshadow -pedantic -MMD -fPIC -fopenmp \
        -DSLATE_WITH_MKL -DSLATE_NO_HIP \
        -I./blaspp/include -I./lapackpp/include -I./include -I./src \
        -O0 -I./testsweeper -c test/test.cc -o test/test.o

where the later -O0 overrides the earlier -O3.


Run tester with MPI

Here's an example that is failing. I added Tile A00 = A( 0, 0 ); in src/internal/internal_gemm.cc, which fails on ranks where A( 0, 0 ) doesn't exist.

slate/test> mpirun -np 4 ./tester gemm
SLATE version 2022.05.00, id 483bde4a
input: ./tester gemm
2022-06-02 14:38:26, MPI size 4, OpenMP threads 20, GPU devices available 8

type  origin  target    m  ...     error   time (s)  ...  status
   d    host    task  100  ...  3.11e-16   0.000441  ...  pass
   d    host    task  200  ...  4.60e-16    0.00149  ...  pass
   d    host    task  300  ...  2.92e-16    0.00304  ...  pass
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

Adding the --debug R flag to the tester will cause rank R to wait for debugger to attach (here, R = 1).

slate/test> mpirun -np 4 ./tester --debug 1 gemm
MPI rank 1, pid 71503 on leconte.icl.utk.edu ready for debugger (gdb/lldb) to attach.
After attaching, step out to run() and set i=1, e.g.:
lldb -p 71503
(lldb) break set -n __cxa_throw  # break on C++ exception
(lldb) thread step-out           # repeat
(lldb) expr i=1
(lldb) continue

Rank 1 waits here for a debugger to attach. Once a debugger attaches and continues execution (see below), the tester will keep going.

SLATE version 2022.05.00, id 483bde4a
input: ./tester --debug 1 gemm
2022-06-02 14:41:31, MPI size 4, OpenMP threads 20, GPU devices available 8

type  origin  target    m  ...     error   time (s)  ...  status
   d    host    task  100  ...  3.14e-16   0.000483  ...  pass
   d    host    task  200  ...  4.48e-16    0.00147  ...  pass
   d    host    task  300  ...  2.83e-16    0.00319  ...  pass
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 71502 RUNNING AT leconte.icl.utk.edu
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

lldb session

Run lldb or gdb debugger in a separate terminal, attaching to the tester process per instructions that SLATE's tester printed (above).

> lldb -p 71503
Process 71503 stopped
* thread #1, name = 'tester', stop reason = signal SIGSTOP
    frame #0: 0x00007f340f8c89fd libc.so.6`__nanosleep + 45
libc.so.6`__nanosleep:
->  0x7f340f8c89fd <+45>: movq   (%rsp), %rdi
    0x7f340f8c8a01 <+49>: movq   %rax, %rdx
    0x7f340f8c8a04 <+52>: callq  0x7f340f90f890            ; __libc_disable_asynccancel
    0x7f340f8c8a09 <+57>: movq   %rdx, %rax
  thread #2, name = 'cuda-EvtHandlr', stop reason = signal SIGSTOP
    frame #0: 0x00007f340f8f6ddd libc.so.6`poll + 45
libc.so.6`poll:
->  0x7f340f8f6ddd <+45>: movq   (%rsp), %rdi
    0x7f340f8f6de1 <+49>: movq   %rax, %rdx
    0x7f340f8f6de4 <+52>: callq  0x7f340f90f890            ; __libc_disable_asynccancel
    0x7f340f8f6de9 <+57>: movq   %rdx, %rax

Break on C++ exceptions:

(lldb) break set -n __cxa_throw
Breakpoint 2: where = libstdc++.so.6`__cxxabiv1::__cxa_throw(void *, std::type_info *, void (*)(void *)) at eh_throw.cc:77:1, address = 0x00007f34103cdff0

It's helpful to immediately do a backtrace and disassembly when breaking; here's a hook from an lldb cheat sheet. I've found sometimes other MPI ranks will cause the whole program to abort without having time to manually run debugger commands.

(lldb) target stop-hook add
Enter your stop hook command(s).  Type 'DONE' to end.
> bt
> disassemble --pc
DONE
Stop hook #1 added.

Initially the debugger will probably be stopped in some system sleep routine (__nanosleep). Use thread step-out a few times until it shows the SLATE tester source code with while (0 == i).

(lldb) thread step-out
* thread #1, name = 'tester', stop reason = step out
  * frame #0: 0x00007f340f8c8894 libc.so.6`sleep + 212
    frame #1: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:651:22
    frame #2: 0x000000000045db42 tester`main(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:764:21
    frame #3: 0x00007f340f825555 libc.so.6`__libc_start_main + 245
    frame #4: 0x0000000000459677 tester`_start + 41

libc.so.6`sleep:
->  0x7f340f8c8894 <+212>: movl   %eax, %ebx
    0x7f340f8c8896 <+214>: testl  %ebx, %ebx
    0x7f340f8c8898 <+216>: je     0x7f340f8c88c0            ; <+256>
    0x7f340f8c889a <+218>: xorl   %ebp, %ebp

Process 71503 stopped
* thread #1, name = 'tester', stop reason = step out
    frame #0: 0x00007f340f8c8894 libc.so.6`sleep + 212
libc.so.6`sleep:
->  0x7f340f8c8894 <+212>: movl   %eax, %ebx
    0x7f340f8c8896 <+214>: testl  %ebx, %ebx
    0x7f340f8c8898 <+216>: je     0x7f340f8c88c0            ; <+256>
    0x7f340f8c889a <+218>: xorl   %ebp, %ebp
(lldb) thread step-out
* thread #1, name = 'tester', stop reason = step out
  * frame #0: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:650:13
    frame #1: 0x000000000045db42 tester`main(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:764:21
    frame #2: 0x00007f340f825555 libc.so.6`__libc_start_main + 245
    frame #3: 0x0000000000459677 tester`_start + 41

tester`run:
->  0x45d2c7 <+2711>: jmp    0x45d2ae                  ; <+2686> at test.cc:650:22
    0x45d2c9 <+2713>: movl   $0x44000000, %edi         ; imm = 0x44000000
    0x45d2ce <+2718>: callq  0x435230                  ; symbol stub for: MPI_Barrier
    0x45d2d3 <+2723>: movl   %eax, -0x64(%rbp)

Process 71503 stopped
* thread #1, name = 'tester', stop reason = step out
    frame #0: 0x000000000045d2c7 tester`run(argc=4, argv=0x00007ffe1f6d35d8) at test.cc:650:13
   647 	                    "(lldb) continue\n",
   648 	                    mpi_rank, getpid(), hostname, getpid() );
   649 	            fflush( stdout );
-> 650 	            while (0 == i)
   651 	                sleep(1);
   652 	        }
   653 	        slate_mpi_call( MPI_Barrier( MPI_COMM_WORLD ) );

Setting expr i=1 will break that while loop. If the debugger doesn't know the variable i, check that you compiled with -g and -O0.

(lldb) expr i=1
(volatile int) $1 = 1

Continue running until a C++ exception or breakpoint occurs, or the program completes. Here it broke at a C++ exception which the back trace, in frame #5, shows occurred in slate::internal::gemm.cc line 76, which is indeed where the error was injected.

(lldb) continue
Process 71503 resuming
  thread #11, name = 'tester', stop reason = breakpoint 1.1 2.1
    frame #0: 0x00007f34103cdff0 libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x00007f3198000960, tinfo=0x00007f34106e9228, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1
    frame #1: 0x00007f34103c5352 libstdc++.so.6`std::__throw_out_of_range(__s="map::at") at functexcept.cc:82:5
    frame #2: 0x00000000004b3c4c tester`slate::BaseMatrix<double>::operator()(long, long, int) at stl_map.h:541:24
    frame #3: 0x00000000004b3c40 tester`slate::BaseMatrix<double>::operator()(long, long, int) at MatrixStorage.hh:388
    frame #4: 0x00000000004b3c40 tester`slate::BaseMatrix<double>::operator(this=0x00007f330cb6edc0, i=0, j=0, device=-1)(long, long, int) at BaseMatrix.hh:1236
    frame #5: 0x00007f34239b696e libslate.so`void slate::internal::gemm<double>((null)=TargetType<(slate::Target)84> @ 0x00007f330cb6ece0, alpha=3.1415926535897931, A=0x00007f330cb6edc0, B=0x00007f330cb6ed40, beta=2.7182818284590451, C=0x00007ffe1f6cde40, layout=ColMajor, priority=0, queue_index=<unavailable>, opts=error: summary string parsing error)84>, double, slate::Matrix<double>&, slate::Matrix<double>&, double, slate::Matrix<double>&, blas::Layout, int, long, std::map<slate::Option, slate::OptionValue, std::less<slate::Option>, std::allocator<std::pair<slate::Option const, slate::OptionValue> > > const&) at internal_gemm.cc:76:13
    frame #6: 0x00007f34239b6f77 libslate.so`void slate::internal::gemm<(slate::Target)84, double>(alpha=<unavailable>, A=<unavailable>, B=<unavailable>, beta=<unavailable>, C=<unavailable>, layout=<unavailable>, priority=<unavailable>, queue_index=<unavailable>, opts=error: summary string parsing error) at internal_gemm.cc:52:9
    frame #7: 0x00007f3423d4f3ad libslate.so`_ZN5slate5gemmCILNS_6TargetE84EdEEvT0_RNS_6MatrixIS2_EES5_S2_S5_RKSt3mapINS_6OptionENS_11OptionValueESt4lessIS7_ESaISt4pairIKS7_S8_EEE._omp_fn.4((null)=0x00007f330cb6edc0) at gemmC.cc:106:35
    frame #8: 0x00007f340fdfe1f4 libgomp.so.1`gomp_barrier_handle_tasks(state=320) at task.c:1387:6
    frame #9: 0x00007f340fe05818 libgomp.so.1`gomp_team_barrier_wait_end(bar=<unavailable>, state=320) at bar.c:116:4
    frame #10: 0x00007f340fe02e32 libgomp.so.1`gomp_thread_start(xdata=<unavailable>) at team.c:124:4
    frame #11: 0x00007f341bab7ea5 libpthread.so.0`start_thread + 197
    frame #12: 0x00007f340f901b0d libc.so.6`__clone + 109

libstdc++.so.6`__cxxabiv1::__cxa_throw(void *, std::type_info *, void (*)(void *)):
->  0x7f34103cdff0 <+0>: pushq  %r13
    0x7f34103cdff2 <+2>: movq   %rdx, %r13
    0x7f34103cdff5 <+5>: pushq  %r12
    0x7f34103cdff7 <+7>: movq   %rsi, %r12

Process 71503 stopped
* thread #11, name = 'tester', stop reason = breakpoint 1.1 2.1
    frame #0: 0x00007f34103cdff0 libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x00007f3198000960, tinfo=0x00007f34106e9228, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1
Process 71503 exited with status = -1 (0xffffffff) debugserver died with an exit status of 0x00000000

lldb startup

Initial commands can be put into a init.lldb file:

break set -n __cxa_throw

target stop-hook add
bt
disassemble --pc
DONE

that is sourced when running lldb:

lldb -s init.lldb -p 71503
Clone this wiki locally