Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t0006-notify.t fails with segfault in shell plugin #91

Closed
garlick opened this issue Oct 3, 2023 · 0 comments
Closed

t0006-notify.t fails with segfault in shell plugin #91

garlick opened this issue Oct 3, 2023 · 0 comments

Comments

@garlick
Copy link
Member

garlick commented Oct 3, 2023

Problem: The first test fails with the following stack trace, which seems to indicate a user-after-free on message in notify_shell_cb()

$ ./t0006-notify.t -v
expecting success: 
	run_timeout 30 flux run \
		${NOTIFY} --status=69 2>warn.err &&
	grep event-status=69 warn.err

not ok 1 - 1n1p event notify triggers warning on stderr
#	
#		run_timeout 30 flux run \
#			${NOTIFY} --status=69 2>warn.err &&
#		grep event-status=69 warn.err
#	

expecting success: 
	run_timeout 30 flux run \
		${NOTIFY} --status=69 --message="lorem ipsum" 2>message.err &&
	grep "lorem ipsum" message.err

0.235s: flux-shell[0]:  WARN: pmix: notify source=f2swcE4f.0 event-status=69 lorem ipsum
ok 2 - 1n1p event notify with message works

# failed 1 among 2 test(s)
1..2
Oct 03 18:09:41.842845 broker.err[0]: rc2.0: sh ./t0006-notify.t  --verbose Exited (rc=1) 2.9s
flux-start: 0 (pid 2273874) exited with rc=1

$ cat trash*/warn.err
flux-job: task(s) Segmentation fault
$

gdb backtrace snippet:

#6  0x000000558935f508 in flux_shell_log (component=0x20033a4498 "pmix", level=4, file=0x20033a4488 "notify.c", line=86, 
    fmt=<optimized out>) at log.c:201
        buf = "notify source=f2Hjxo1H.0 event-status=69 \000\000\000\000\000\000\000\030\354\301\356\177\000\000\000\000\353\301\356\177\000\000\000\000\300\060\000 \000\000\000\360%\016\241U\000\000\000`)\027\241U\000\000\000\060\353\301\356\177\000\000\000\250\363\065\211U\000\000\000\000 ;\211U\000\000\000\350,;\211U", '\000' <repeats 11 times>, "\030\354\301\356\177\000\000\000\000>:\003 \000\000\000h\000\000\000\000\000\000\000\000\353\301\356\177\000\000\000\000\300\060\000 \000\000\000\001", '\000' <repeats 15 times>, "\020>:\003 \000\000\000\000\036m\217\354B\375\240"...
        ap = {__stack = 0x7feec1fae0, __gr_top = 0x7feec1fae0, __vr_top = 0x7feec1fac0, __gr_offs = -24, __vr_offs = -128}
#7  0x00000020033a213c in notify_shell_cb () from /nfshome/garlick/proj/flux-pmix/src/shell/plugins/.libs/pmix.so
No symbol table info available.
#8  0x000000200339ff24 in interthread_recv () from /nfshome/garlick/proj/flux-pmix/src/shell/plugins/.libs/pmix.so
No symbol table info available.
#9  0x0000002000337814 in ev_invoke_pending (loop=0x200038d440 <default_loop_struct>) at ev.c:3770
        p = <optimized out>

Note this is on the working branch for #90

garlick added a commit to garlick/flux-pmix that referenced this issue Oct 3, 2023
Problem: notify_shell_cb() tries to unpack a json object onto
an int, causing stack corruption, and a segfault that was causing
the first test in t0006-notify.t to fail.

Fix unpack format string.

Fixes flux-framework#91
garlick added a commit to garlick/flux-pmix that referenced this issue Oct 4, 2023
Problem: notify_shell_cb() tries to unpack a json object onto
an int, causing stack corruption, and a segfault that was causing
the first test in t0006-notify.t to fail.

Fix unpack format string.

Fixes flux-framework#91
@mergify mergify bot closed this as completed in d59ffa7 Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant