-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do not merge] CI test, cmake option to enable address santitizer. #2890
base: master
Are you sure you want to change the base?
Conversation
Here "dynamically linked" means this?
Is this "NOT pass that library name" just related to LD_PRELOAD? Could you please give some more details? |
Ah, I was thinking that ASAN uses LD_PRELOAD, but I guess it doesn't the way you're building it. |
That would be at least needed when inspecting a library built with ASAN with an executable built without. |
Dang, you beat me to it! I've also been working on this, but it seems I spent too much time messing with hacks ;) I have some comments (and maybe some patches) that might help. |
CMakeLists.txt
Outdated
@@ -522,6 +522,12 @@ if (x86ish) | |||
set(RR_SOURCES ${RR_SOURCES} src/test/x86/cpuid_loop.S) | |||
endif() | |||
|
|||
option(asan "Build with address santitizer enabled.") | |||
if (asan) | |||
set(ASAN_FLAGS "-fno-omit-frame-pointer -fsanitize=address") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is -fno-omit-frame-pointer
necessary here? If this was because of less-than-useful stacktraces from leaks, there's ASAN_OPTIONS=fast_unwind_on_malloc=false
. I'm wondering if -fno-omit-frame-pointer
might be causing some of the test timeout issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
-fno-omit-frame-pointer
necessary here?
I am just using it because the usage example I looked at had it too. Have to test if that makes a difference.
At least even with this the stack e.g. in 7214d96 was not complete.
CMakeLists.txt
Outdated
@@ -1539,7 +1539,7 @@ if(BUILD_TESTS) | |||
# reduced single-core performance. | |||
exec_program(cat ARGS "/proc/cpuinfo" OUTPUT_VARIABLE CPUINFO) | |||
string(REGEX MATCH "^.*(Xeon Phi).*$" CPU_MODEL_PHI ${CPUINFO}) | |||
if(NOT "${CPU_MODEL_PHI}" STREQUAL "") | |||
if(NOT "${CPU_MODEL_PHI}" STREQUAL "" OR asan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't seeing this kind of issue running with an ASan build. Would you mind checking if you're still getting timeouts without this commit and without -fno-omit-frame-pointer
? (Sorry, I know the test suite speed is kinda painful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed -fno-omit-frame-pointer
, but timeouts seem unchanged:
1265/2625 Test #869: conditional_breakpoint_offload-no-syscallbuf ...................... Passed 261.44 sec
2620/2625 Test #2358: x86/string_instructions_async_signals_shared-32 ................... Passed 264.50 sec
1280/2625 Test #868: conditional_breakpoint_offload .................................... Passed 274.21 sec
2586/2625 Test #2180: conditional_breakpoint_offload-32 ................................. Passed 329.61 sec
2/3 Test #1217: record_replay-no-syscallbuf ...... Passed 533.47 sec
2624/2625 Test #2528: record_replay-32 .................................................. Passed 574.54 sec
2625/2625 Test #2529: record_replay-32-no-syscallbuf .................................... Passed 608.18 sec
2621/2625 Test #1216: record_replay ..................................................... Passed 741.66 sec
With these tests still failing:
1203 - nested_detach-no-syscallbuf (Failed)
2515 - nested_detach-32-no-syscallbuf (Failed)
So from my point of view the -fno-omit-frame-pointer
does not change much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid misunderstanding - I run the test suite also in parallel (ctest --verbose -j$(nproc)
).
Therefore the it depends much of the related other tests that run at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really solve the issue, and it actually masks the fact that support for running ASan binaries in rr is currently broken (I'll open a PR to fix that in a bit). This is actually what I've been stuck on for a few days.
I'm thinking maybe the right approach is to detect ASan-using binaries before they exec()
and fix-up LD_PRELOAD
there.
EDIT: Woops, this was regarding 23fafc1 (" Run child process with ASAN_OPTIONS=verify_asan_link_order=0. ")
src/Task.cc
Outdated
@@ -3671,6 +3671,12 @@ void Task::dup_from(Task *other) { | |||
create_mapping(this, remote_this, km); | |||
LOG(debug) << "Copying mapping into " << tid; | |||
if (!(km.flags() & MAP_SHARED)) { | |||
#if defined(__x86_64__) && defined(__SANITIZE_ADDRESS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think possibly masks a real issue that maybe should be investigated? I haven't looked into it myself yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is just a workaround for asan mapping a really huge area, which cannot be allocated by plain malloc.
Maybe rr has to reproduce this mapping and copy just the "used" memory. Have not looked any further into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I tried to modify the copy_mem_mapping to not try to allocate the whole chunk in one, instead to copy by e.g. 256 MB blocks. Unfortunately this did just start to fill RAM.
I guess if we really try to duplicate such a 20 TB "MAP_NORESERVE" mapping, a way is needed to tell pages which got accessed once and therefore are "in use" apart from these pages, which never got touched, and therefore the kernel did never allocate a hardware page for it. (If I understand this situation right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing you can do here is read /proc/<pid>/pagemap
and check if the page frame number for each anonymous page is zero. If it is, the page is all zeroes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint.
I found the page frame number (PFN), bits 0-54 were all 0.
By testing the bit 63 "page present" seems to better show if something is in use.
Unfortunately testing the whole 20 TB increases the test time from 3 seconds to nearly 400 seconds.
And all in all there seem just around 126 pages of it in use.
So I guess this is not the final solution.
88792e9 contains my attempt to use /proc/pid/pagemap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That program processes 1M pages per read (so 4GB instead of 1GB per read), and uses pread()
to halve the number of syscalls, but I don't think that would explain why my program is 20x faster than your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code calls page_size()
, i.e. sysconf(_SC_PAGE_SIZE)
, twice per chunk (and twice per page present!) ... maybe that's really slow? Or maybe building rr non-optimized, the std::vector
operations (size()
and operator[]
) per page, are really slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your sample code.
I tried to do some modifications:
- 4m55: before
- 5m18: Assigned page_size() to a local variable (tested just once, seems I did something in parallel)
- 2m10: replaced the vector by a c-array
Then I changed the loop to your version (I hope it is ok to use it?).
And then I noticed, that I commented the && km.size() >= 0x100000000
and therefore used
copy_mem_mapping_just_used
for all MAP_NORESERVE
mappings,
which might not use holes and therefore get copied slowly page by page.
If I just use it for I got it down to 35 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use pread instead of lseek/read? You should.
copy_mem_mapping_just_used for all MAP_NORESERVE mappings,
which might not use holes and therefore get copied slowly page by page.
Seems like the right thing to do is to identify runs of consecutive pages that are present, and handle each run with a single copy_mem_mapping call. I think you should use copy_mem_mapping_just_used for all anonymous mappings (which you're not checking for currently!), not just MAP_NORESERVE, and not just for big mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly I first found that just the first lseek with read was needed, so moved lseek before the loop.
But I could not see an improvement in time. Similar for removing lseek and changing read to pread.
Then I ripped out my version and used your lines.
I tried to add this identification of consecutive pages and call it for all sizes of ANONYMOUS|NORESERVE mappings.
Because I was specifically interested in running just the tests against an asan enabled rr (where I intentionally did not build the tests with asan), I took this "shortcut". |
c32472e
to
1ed97e6
Compare
Rebased to current head, because a few cheap patches got merged separately. Remaining patches are unchanged. |
src/RecordSession.cc
Outdated
@@ -2230,6 +2230,10 @@ static string lookup_by_path(const string& name) { | |||
env.push_back("SYSTEMD_RDRAND=0"); | |||
} | |||
|
|||
#if defined(__SANITIZE_ADDRESS__) | |||
env.push_back("ASAN_OPTIONS=verify_asan_link_order=0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this, and I'd suggest that this should be fine as a testsuite-specific hack provided there's a comment mentioning that
- This is needed to run the test suite with an ASan-instrumented rr because of rr's lack of support for children
exec
ing ASan binaries, and should be removed if/when rr gains such support. - This is specific to the case of running the test suite and isn't supposed to help with general usage.
Please feel free to word it slightly less awkwardly ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to add a comment.
src/test/ignore_nested.c
Outdated
@@ -5,11 +5,17 @@ | |||
// environment. The runner uses this to test | |||
// --nested=ignore in the absence of RR_UNDER_RR | |||
// environment variable. | |||
int main(int argc, char *argv[]) { | |||
int main(int argc, char *argv[], char *envp[]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above re comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole change to the test seems not needed in a tree that contains your vdso soname patch, so I remove this from my draft.
1ed97e6
to
17bae24
Compare
@Bob131 I hope it is ok to integrate your patches to this draft? |
ab793c9
to
30e13b8
Compare
src/RecordCommand.cc
Outdated
@@ -629,7 +629,7 @@ static WaitStatus record(const vector<string>& args, const RecordFlags& flags) { | |||
flags.use_syscall_buffer, flags.syscallbuf_desched_sig, | |||
flags.bind_cpu, flags.output_trace_dir, | |||
flags.trace_id.get(), | |||
flags.stap_sdt, flags.unmap_vdso); | |||
flags.stap_sdt, flags.unmap_vdso, flags.asan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khuey While working at this draft branch I found that the asan flag is currently set but never used, as far as I see.
And I am guessing right, that this flag is intended to go into RecordSession::create
?
Should I put this into a separate PR?
60271f8
to
6655052
Compare
6655052
to
51f997e
Compare
c87b21b
to
1934fe4
Compare
1934fe4
to
1d0adda
Compare
Just rebased the patches from 2021 to current git tip. |
Can we split this up into a few separate PRs?
and then if we still can't figure out the test failures, the extra changes to disable tests, but I'd prefer to just go through and debug why it's not working. I think it'll make it easier to review and faster to merge in isolated pieces. |
9444af3
to
b2e9c5e
Compare
1c9362c
to
66f32e2
Compare
6645058
to
5c1c898
Compare
5c1c898
to
b7601ac
Compare
b7601ac
to
737068c
Compare
…force32bit to 55 minutes.
Error message: [FATAL src/MemoryRange.h:20:MemoryRange()] start_ <= end_
CI showed timeouts with test: madvise_fracture_flags (exceeded 120s, aarch64) nested_release (exceeded 120s) record_replay (exceeded 600s)
Remove those tests for now. No idea how that could be avoided? Error message: [FATAL src/PerfCounters.cc:785:read_ticks()] 1 (speculatively) executed strex instructions detected. Example: https://buildkite.com/julialang/rr/builds/1141 BUILDKITE_AGENT_NAME="default-armageddon.3"
Avoids error message: ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. Affected tests: nested_detach nested_detach_kill nested_detach_kill_stuck nested_detach_stop nested_detach_wait
737068c
to
cc0d479
Compare
This are the patches of my recent address santizer work.
With these all tests pass except two.
But at least some take much longer, the whole run around 24 minutes instead of 9.
The tests
nested_detach-no-syscallbuf
andnested_detach-32-no-syscallbuf
do not finish even with 980 seconds timeout, and given the testsnested_detach
andnested_detach-32
finish also with asan in a few seconds, I guess they show a hanging test.What do you think? Is there something already suitable for merging?