Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use after free below update_syscallbuf_fds_disabled #3475

Closed
bernhardu opened this issue Mar 28, 2023 · 9 comments
Closed

Use after free below update_syscallbuf_fds_disabled #3475

bernhardu opened this issue Mar 28, 2023 · 9 comments

Comments

@bernhardu
Copy link
Contributor

bernhardu commented Mar 28, 2023

I ran the tests with a force32bit build, but found the test reverse_step_breakpoint failing.
A git bisect seems to point to f46af28 as first broken commit.

As I had trouble to record rr itself, I undusted my asan patches and it shows the same use after free for x86_64.
Below is a manual replay of the test.

$ export ASAN_OPTIONS=fast_unwind_on_malloc=0
$ bin/rr replay --debugger-option=-q /tmp/rr-test-reverse_step_breakpoint-2KELOrNOA/simple-2KELOrNOA-0
Reading symbols from /tmp/rr-test-reverse_step_breakpoint-2KELOrNOA/simple-2KELOrNOA-0/mmap_hardlink_4_simple-2KELOrNOA...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:40258
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/4f/536ac1cd2e8806aed8556ea7795c47404de8a9.debug...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007faeb98699c0 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) b main
Breakpoint 1 at 0x55817cc4a29a: file .../rr/src/test/simple.c, line 6.
(rr) cont
Continuing.

Breakpoint 1, main () at .../rr/src/test/simple.c:6
6         atomic_puts("EXIT-SUCCESS");
(rr) next
EXIT-SUCCESS
7         return 0;
(rr) b
Breakpoint 2 at 0x55817cc4a2a9: file .../rr/src/test/simple.c, line 7.
(rr) reverse-next
=================================================================
==695618==ERROR: AddressSanitizer: heap-use-after-free on address 0x617000005ba0 at pc 0x56148d37f34a bp 0x7fffb4bfe060 sp 0x7fffb4bfe058
READ of size 8 at 0x617000005ba0 thread T0
    #0 0x56148d37f349 in std::_Rb_tree<rr::Task*, rr::Task*, std::_Identity<rr::Task*>, std::less<rr::Task*>, std::allocator<rr::Task*> >::begin() const /usr/include/c++/12/bits/stl_tree.h:1000
    #1 0x56148d37f349 in std::set<rr::Task*, std::less<rr::Task*>, std::allocator<rr::Task*> >::begin() const /usr/include/c++/12/bits/stl_set.h:345
    #2 0x56148d37f349 in rr::FdTable::update_syscallbuf_fds_disabled(int) .../rr/src/FdTable.cc:182
    #3 0x56148d37f53b in rr::FdTable::did_close(int) .../rr/src/FdTable.cc:146
    #4 0x56148d8df69f in void rr::Task::on_syscall_exit_arch<rr::X64Arch>(int, rr::Registers const&) .../rr/src/Task.cc:670
    #5 0x56148d8a209a in operator() .../rr/src/Task.cc:839
    #6 0x56148d8a209a in with_converted_registers<void, rr::Task::on_syscall_exit(int, rr::SupportedArch, const rr::Registers&)::<lambda(const rr::Registers&)> > .../rr/src/Registers.h:624
    #7 0x56148d8a223a in rr::Task::on_syscall_exit(int, rr::SupportedArch, rr::Registers const&) .../rr/src/Task.cc:838
    #8 0x56148d714791 in rr::ReplaySession::exit_syscall(rr::ReplayTask*) .../rr/src/ReplaySession.cc:659
    #9 0x56148d736a72 in rr::ReplaySession::try_one_trace_step(rr::ReplayTask*, rr::ReplaySession::StepConstraints const&) .../rr/src/ReplaySession.cc:1606
    #10 0x56148d73b7a0 in rr::ReplaySession::replay_step(rr::ReplaySession::StepConstraints const&) .../rr/src/ReplaySession.cc:1922
    #11 0x56148d7b6b35 in rr::ReplayTimeline::reverse_singlestep(rr::ReplayTimeline::Mark const&, rr::TaskishUid<rr::Task> const&, long, std::function<bool (rr::ReplayTask*, rr::BreakStatus const&)> const&, std::function<bool ()> const&) .../rr/src/ReplayTimeline.cc:1232
    #12 0x56148d7c5557 in rr::ReplayTimeline::reverse_singlestep(rr::TaskishUid<rr::Task> const&, long, std::function<bool (rr::ReplayTask*, rr::BreakStatus const&)> const&, std::function<bool ()> const&) .../rr/src/ReplayTimeline.cc:1492
    #13 0x56148d430410 in rr::GdbServer::debug_one_step(rr::GdbRequest&) .../rr/src/GdbServer.cc:1412
    #14 0x56148d43498a in rr::GdbServer::serve_replay(rr::GdbServer::ConnectionFlags const&) .../rr/src/GdbServer.cc:1831
    #15 0x56148d70b8f1 in replay .../rr/src/ReplayCommand.cc:518
    #16 0x56148d70b8f1 in rr::ReplayCommand::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) .../rr/src/ReplayCommand.cc:632
    #17 0x56148d286e8e in main .../rr/src/main.cc:299
    #18 0x7fab92446189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #19 0x7fab92446244 in __libc_start_main_impl ../csu/libc-start.c:381
    #20 0x56148d286fd0 in _start (.../obj/bin/rr+0xfdfd0)

0x617000005ba0 is located 32 bytes inside of 736-byte region [0x617000005b80,0x617000005e60)
freed by thread T0 here:
    #0 0x7fab92aba3c8 in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:164
    #1 0x56148d88d910 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/12/bits/shared_ptr_base.h:346
    #2 0x56148d88d910 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/12/bits/shared_ptr_base.h:317
    #3 0x56148d88d910 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/12/bits/shared_ptr_base.h:1071
    #4 0x56148d88d910 in std::__shared_ptr<rr::AddressSpace, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/12/bits/shared_ptr_base.h:1524
    #5 0x56148d88d910 in std::shared_ptr<rr::AddressSpace>::~shared_ptr() /usr/include/c++/12/bits/shared_ptr.h:175
    #6 0x56148d88d910 in rr::Task::~Task() .../rr/src/Task.cc:267
    #7 0x56148d7842e6 in rr::ReplayTask::~ReplayTask() .../rr/src/ReplayTask.h:17
    #8 0x56148d7842e6 in rr::ReplayTask::~ReplayTask() .../rr/src/ReplayTask.h:17
    #9 0x56148d813ff9 in rr::Session::kill_all_tasks() .../rr/src/Session.cc:231
    #10 0x56148d719738 in rr::ReplaySession::~ReplaySession() .../rr/src/ReplaySession.cc:234
    #11 0x56148d7402d5 in rr::ReplaySession::~ReplaySession() .../rr/src/ReplaySession.cc:238
    #12 0x56148d7402d5 in std::_Sp_counted_ptr<rr::ReplaySession*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/12/bits/shared_ptr_base.h:428
    #13 0x56148d2dcb3d in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/12/bits/shared_ptr_base.h:346
    #14 0x56148d2dcb3d in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/12/bits/shared_ptr_base.h:317
    #15 0x56148d79d2c9 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/12/bits/shared_ptr_base.h:1071
    #16 0x56148d79d2c9 in std::__shared_ptr<rr::ReplaySession, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/12/bits/shared_ptr_base.h:1524
    #17 0x56148d79d2c9 in std::__shared_ptr<rr::ReplaySession, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<rr::ReplaySession, (__gnu_cxx::_Lock_policy)2>&&) /usr/include/c++/12/bits/shared_ptr_base.h:1620
    #18 0x56148d79d2c9 in std::shared_ptr<rr::ReplaySession>::operator=(std::shared_ptr<rr::ReplaySession>&&) /usr/include/c++/12/bits/shared_ptr.h:440
    #19 0x56148d79d2c9 in rr::ReplayTimeline::seek_to_before_key(rr::ReplayTimeline::MarkKey const&) .../rr/src/ReplayTimeline.cc:345
    #20 0x56148d7b4c76 in rr::ReplayTimeline::reverse_singlestep(rr::ReplayTimeline::Mark const&, rr::TaskishUid<rr::Task> const&, long, std::function<bool (rr::ReplayTask*, rr::BreakStatus const&)> const&, std::function<bool ()> const&) .../rr/src/ReplayTimeline.cc:1197
    #21 0x56148d7c5557 in rr::ReplayTimeline::reverse_singlestep(rr::TaskishUid<rr::Task> const&, long, std::function<bool (rr::ReplayTask*, rr::BreakStatus const&)> const&, std::function<bool ()> const&) .../rr/src/ReplayTimeline.cc:1492
    #22 0x56148d430410 in rr::GdbServer::debug_one_step(rr::GdbRequest&) .../rr/src/GdbServer.cc:1412
    #23 0x56148d43498a in rr::GdbServer::serve_replay(rr::GdbServer::ConnectionFlags const&) .../rr/src/GdbServer.cc:1831
    #24 0x56148d70b8f1 in replay .../rr/src/ReplayCommand.cc:518
    #25 0x56148d70b8f1 in rr::ReplayCommand::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) .../rr/src/ReplayCommand.cc:632
    #26 0x56148d286e8e in main .../rr/src/main.cc:299
    #27 0x7fab92446189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #28 0x7fab92446244 in __libc_start_main_impl ../csu/libc-start.c:381
    #29 0x56148d286fd0 in _start (.../obj/bin/rr+0xfdfd0)

previously allocated by thread T0 here:
    #0 0x7fab92ab94c8 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
    #1 0x56148d836b0f in rr::Session::clone(rr::Task*, std::shared_ptr<rr::AddressSpace>) .../rr/src/Session.cc:120
    #2 0x56148d8ab2a5 in rr::Task::clone(rr::Task::CloneReason, int, rr::remote_ptr<void>, rr::remote_ptr<void>, rr::remote_ptr<int>, int, int, unsigned int, rr::Session*, std::shared_ptr<rr::FdTable>, std::shared_ptr<rr::ThreadGroup>) .../rr/src/Task.cc:2326
    #3 0x56148d89c8ee in rr::Task::os_clone(rr::Task::CloneReason, rr::Session*, rr::AutoRemoteSyscalls&, int, unsigned int, unsigned int, std::shared_ptr<rr::FdTable>, std::shared_ptr<rr::ThreadGroup>, rr::remote_ptr<void>, rr::remote_ptr<int>, rr::remote_ptr<void>, rr::remote_ptr<int>) .../rr/src/Task.cc:3214
    #4 0x56148d89d234 in rr::Task::os_fork_into(rr::Session*, std::shared_ptr<rr::FdTable>) .../rr/src/Task.cc:2453
    #5 0x56148d82bea1 in rr::Session::copy_state_to(rr::Session&, rr::EmuFs&, rr::EmuFs&) .../rr/src/Session.cc:682
    #6 0x56148d720ccc in rr::ReplaySession::clone() .../rr/src/ReplaySession.cc:249
    #7 0x56148d7b0eef in rr::ReplayTimeline::add_explicit_checkpoint() .../rr/src/ReplayTimeline.cc:297
    #8 0x56148d7b3750 in rr::ReplayTimeline::set_short_checkpoint() .../rr/src/ReplayTimeline.cc:1660
    #9 0x56148d7c09c4 in rr::ReplayTimeline::reverse_continue(std::function<bool (rr::ReplayTask*, rr::BreakStatus const&)> const&, std::function<bool ()> const&) .../rr/src/ReplayTimeline.cc:1037
    #10 0x56148d431302 in rr::GdbServer::debug_one_step(rr::GdbRequest&) .../rr/src/GdbServer.cc:1407
    #11 0x56148d43498a in rr::GdbServer::serve_replay(rr::GdbServer::ConnectionFlags const&) .../rr/src/GdbServer.cc:1831
    #12 0x56148d70b8f1 in replay .../rr/src/ReplayCommand.cc:518
    #13 0x56148d70b8f1 in rr::ReplayCommand::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) .../rr/src/ReplayCommand.cc:632
    #14 0x56148d286e8e in main .../rr/src/main.cc:299
    #15 0x7fab92446189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #16 0x7fab92446244 in __libc_start_main_impl ../csu/libc-start.c:381
    #17 0x56148d286fd0 in _start (.../obj/bin/rr+0xfdfd0)

SUMMARY: AddressSanitizer: heap-use-after-free /usr/include/c++/12/bits/stl_tree.h:1000 in std::_Rb_tree<rr::Task*, rr::Task*, std::_Identity<rr::Task*>, std::less<rr::Task*>, std::allocator<rr::Task*> >::begin() const
Shadow bytes around the buggy address:
  0x0c2e7fff8b20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8b30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8b40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8b50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  0x0c2e7fff8b60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2e7fff8b70: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8b80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8b90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8ba0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8bb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e7fff8bc0: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==695618==ABORTING
Remote connection closed
@rocallahan
Copy link
Collaborator

It looks like we're doing a UAF on AddressSpace but I can't reproduce it with manual instrumentation.

Can you share your ASAN patches? we should land ASAN support and run it on CI.

@bernhardu
Copy link
Contributor Author

bernhardu commented Mar 28, 2023

Originally I wanted to do some more tests with them before pushing again, but they are now rebased to current git tip and are in #2890. Run cmake with -Dasan=true, then I have run just the single test ctest -j1 --verbose --tests-regex '^reverse_step_breakpoint$'

@rocallahan
Copy link
Collaborator

Thanks. Unfortunately passes for me, with c87b7d0 cherrypicked onto "master" (36b02f0).

@rocallahan
Copy link
Collaborator

It also passes for me if I just check out bernhardu:cmake-option-asan, configure with cmake ../rr -DCMAKE_BUILD_TYPE=RELEASE -Dasan=TRUE, and run ctest -j1 --verbose --tests-regex '^reverse_step_breakpoint$'.

@bernhardu
Copy link
Contributor Author

That's bad. I forgot to mention that I am building with -DCMAKE_BUILD_TYPE=RelWithDebInfo,
which causes -O2 instead of -O3 for me. (Off topic: Should this be changed to use the same level?)
Further I am using Debian testing, with a Ryzen 1700, now I tested also in a VM with a i5-4210U.

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DEXTRA_VERSION_STRING=git-5.6.0-143-gf380328e -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -Dasan=true ../../rr
make -j16
ctest -j1 --verbose --tests-regex '^reverse_step_breakpoint$'

That patch currently in my tree should use addionally -fno-omit-frame-pointer or backtraces are not complete. Above I used the "export ASAN_OPTIONS=fast_unwind_on_malloc=0" workaround.

@bernhardu
Copy link
Contributor Author

I continued to wonder what is going on. I could also not reproduce it with a real release and also not with a debug build.
I sprayed some logging into it, and found few Session::clone did happen, after which the FdTable::vms seemed to contain multiple AddressSpaces, which got already destroyed were still contained the set before valid looking ones.
So am further guessing that the cloned vms will get filled in the cloning process. Therefore tried to just remove the vms(other.vms), from FdTable.h to start in the clone with an empty one.
With this change a test run did now succeed, so is that the cause?

@rocallahan
Copy link
Collaborator

Good catch! That's the problem. But I want to figure out why I can't get that problem to show up easily in a test.

@bernhardu
Copy link
Contributor Author

Good to hear. Did asan detect the problem when you built with RelWithDebInfo?
Might be the order of the iterator in for (auto address_space : vms) { be different.
Therefore most of the time the valid entries get iterated first, and the function left
before the destroyed ones are getting looked at?
This should then show up by placing a second loop before, that really iterates through all elements?

@rocallahan
Copy link
Collaborator

Therefore most of the time the valid entries get iterated first, and the function left
before the destroyed ones are getting looked at?

That's right. During replay the destroyed AddressSpace* has to be the first one hit by the iterator, otherwise the first Task we see is not a recording task so we just stop before triggering anything bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants