Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to debug tests infrastructure to help with triaging process #531

Merged
merged 3 commits into from
Mar 1, 2024

Conversation

aap-sc
Copy link
Collaborator

@aap-sc aap-sc commented Feb 1, 2024

This improvement fixes several issues:

  1. print selected seed. Previously the seed was not printed and this
    created problems with reproduction of the issues. It's still not an
    ideal - meaning interactions between spike/gdb/openocd are inherently
    non-determistic (since time is involved), but at least we should get
    the same sources for the same seed now.

  2. introduced a new option to log communications over GDB remote serial
    protocol which is helpful for debugging some tests.

  3. fixes setting of remotetimeout. It was silently overwritten by
    default values from platform definition even if user specified one.

Previously the seed was not printed and this created problems with
reproduction of the issues. It's still not an ideal - meaning
interactions between spike/gdb/openocd are inherently non-determistic
(since time is involved), but at least we should get the same sources
for the same seed now.
introduce a new option to log communications over GDB remote serial
protocol which is helpful for debugging some tests.
fixes setting of `remotetimeout`. It was silently overwritten by default
values from platform definition even if user specified one.
@aap-sc
Copy link
Collaborator Author

aap-sc commented Feb 1, 2024

This helped with debugging some sporadic failures of DownloadTest (incorrect remotetimeout was selected) and SV39Test (riscv-collab/riscv-openocd#1010)

@aap-sc
Copy link
Collaborator Author

aap-sc commented Feb 1, 2024

regarding the pipeline failure:

Run SPIKE_COMMIT=$( git ls-remote "$SPIKE_REPO" $SPIKE_REV | awk '{ print $1; }' )
Revison of Spike: 7c890632ec91104fe6ccd9f16f70842c4412a1fd
Revision of OpenOCD: 87331a82a29a5aeb222543b6876e0208be70ea41
ec28cf03aef7539ff8c02db4a9ac558545d4e7c6
Error: Unable to process file command 'env' successfully.
Error: Invalid format 'ec28cf03aef7539ff8c02db4a9ac558545d4e7c6'

This does not look like a problem with this MR, but rather some kind of infrastructural problem.

@timsifive could you please advise who could review this MR ?

@aap-sc
Copy link
Collaborator Author

aap-sc commented Feb 7, 2024

@rtwfroody (looks like a new account of @timsifive ) and maybe @en-sc - could you take a look please? I'm also not quite sure who should merge these commits now.

Copy link
Collaborator

@en-sc en-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aap-sc
Copy link
Collaborator Author

aap-sc commented Feb 7, 2024

@aswaterman could you take a look?

@en-sc en-sc merged commit a7c1a56 into riscv-software-src:master Mar 1, 2024
2 checks passed
@TommyMurphyTM1234
Copy link
Contributor

TommyMurphyTM1234 commented Jun 5, 2024

This change to gdbserver.py:

    # initialize PRNG
    selected_seed = parsed.seed
    if parsed.seed is None:
        selected_seed = int(datetime.now().timestamp())
        print(f"PRNG seed for {target.name} is generated automatically")
    print(f"PRNG seed for {target.name} is {selected_seed}")
    random.seed(selected_seed)

seems to cause problems with the Makefile driven tests:

make
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	PRNG \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	seed \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	for \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	spike32 \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624514
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	is \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624514
Using $misa from hart definition: 0x8000000000141125
[CheckMisa] Starting > logs/20240605-225514-spike64-CheckMisa.log
[CheckMisa] pass in 2.02s
[CustomRegisterTest] Starting > logs/20240605-225516-spike64-CustomRegisterTest.log
...

because of the first two lines here:

./gdbserver.py --list-tests targets/RISC-V/spike64.py
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624581
CeaseStepiTest
CheckMisa
CustomRegisterTest
DebugBreakpoint
DebugChangeString
...

Perhaps those "PRNG seed ..." lines were just for debugging and were supposed to be removed before the commit/PR was merged?

@aap-sc
Copy link
Collaborator Author

aap-sc commented Jun 5, 2024

seems to cause problems with the Makefile driven tests:

Actually this is intentional. The intention is to print the PRNG seed so one can reproduce the run. What is the issue execatly?

@TommyMurphyTM1234
Copy link
Contributor

seems to cause problems with the Makefile driven tests:

Actually this is intentional. The intention is to print the PRNG seed so one can reproduce the run. What is the issue execatly?

The PRNG seed ... log messages are polluting the output of gdbserver.py --list-test ....

The output should be:

./gdbserver.py --list-tests targets/RISC-V/spike64.py
CeaseStepiTest
CheckMisa
...

not

./gdbserver.py --list-tests targets/RISC-V/spike64.py
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624581
CeaseStepiTest
CheckMisa
...

When those two log messages appear in the list then we get this which is simply wrong:

make
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	PRNG \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	seed \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	for \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624513
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	spike32 \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624514
Using $misa from hart definition: 0x8000000000141125
::::::::::::::::::::::::::::[ ran 0 tests in 0s ]:::::::::::::::::::::::::::::
./gdbserver.py \
	./targets/RISC-V/spike64.py \
	is \
	--isolate \
	--print-failures \
	 \
	--remotelogfile-enable --server_cmd "openocd -d3"
PRNG seed for spike64 is generated automatically
PRNG seed for spike64 is 1717624514
Using $misa from hart definition: 0x8000000000141125
[CheckMisa] Starting > logs/20240605-225514-spike64-CheckMisa.log
[CheckMisa] pass in 2.02s
[CustomRegisterTest] Starting > logs/20240605-225516-spike64-CustomRegisterTest.log
...

@TommyMurphyTM1234
Copy link
Contributor

If the PRNG seed ... log messages are required then they should probably be masked/suppressed when --list-tests is used.

@TommyMurphyTM1234
Copy link
Contributor

Something like this?

    # initialize PRNG
    selected_seed = parsed.seed
    if parsed.seed is None:
        selected_seed = int(datetime.now().timestamp())
        if parsed.list_tests is None:
            print(f"PRNG seed for {target.name} is generated automatically")
    if parsed.list_tests is None:
        print(f"PRNG seed for {target.name} is {selected_seed}")
    random.seed(selected_seed)

@aap-sc
Copy link
Collaborator Author

aap-sc commented Jun 5, 2024

Ahhh. It's just that I've never used this switch. According to the description of this option:

Print out a list of tests, and exit immediately.

Now I see. This parameter implies that no tests are run.

If the PRNG seed ... log messages are required then they should probably be masked/suppressed when --list-tests is used.

Yeah... We should not print that if "list-tests" is specified. Will fix

@TommyMurphyTM1234
Copy link
Contributor

TommyMurphyTM1234 commented Jun 5, 2024

Sorry - my post crossed with yours but maybe my suggested code change above is adequate for a fix?

@aap-sc
Copy link
Collaborator Author

aap-sc commented Jun 5, 2024

Sorry - my post crossed with yours but maybe my suggeted code change above is adequate for a fix?

Yep. It looks like it is.

@TommyMurphyTM1234
Copy link
Contributor

I can do a PR for that change tomorrow (it's late here and I need to rest :-) if you want but if you get there before me then that's obviously fine too. ;-)

@aap-sc
Copy link
Collaborator Author

aap-sc commented Jun 5, 2024

if you want but if you get there before me then that's obviously fine too. ;-)

me tired. me wants sleep. me happy if you create the change :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants