Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK16/J9 extended functional failure on test-osuosl-aix72-ppc64-1 #2059

Closed
sxa opened this issue Mar 20, 2021 · 12 comments
Closed

JDK16/J9 extended functional failure on test-osuosl-aix72-ppc64-1 #2059

sxa opened this issue Mar 20, 2021 · 12 comments

Comments

@sxa
Copy link
Member

sxa commented Mar 20, 2021

Please set the title to indicate the test name and machine name where known.

To make it easy for the infrastructure team to repeat and diagnose, please
answer the following questions:

  • test suite/name? JDK16 OpenJ9 extended.functional suite
  • Is there an existing issue elsewhere covering this? Not to my knowledge
  • Which machine(s) does it work on? It has not failed fatally on aix72-2, build-2, ibm-2, or on this machine previously
  • Which machine(s) does it fail on? test-osuosl-aix72-ppc64-1
  • Do you have a link to a Grinder re-run if the test with the failure? https://ci.adoptopenjdk.net/job/Test_openjdk16_j9_extended.functional_ppc64_aix/40/ was the failure and it can be re-run from there

Any other details: Potential out of disk space issue. As I write this i freed up some space in the jenkins home directory which may have released it to get to the failure state. I haven't looked at the log, but just noting this for whoever picks it up. The time on the machine is currently 0447 and a 3Gb core file was present in /tmp which was created at 2009 yesterday (The test took 8h44 so the core likely came from the test. It may be a simple out of disk space issue in the jenkins home dir that caused this, but we should verify.

FYI to anyone who might be looking at results @smlambert @andrew-m-leonard - I'll re-run it on the same machine at https://ci.adoptopenjdk.net/job/Test_openjdk16_j9_extended.functional_ppc64_aix/41/

@andrew-m-leonard
Copy link
Contributor

20:48:56  gzip: stdout: No space left on device
20:48:56  tar: -: Cannot write: There is no process to read data written to a pipe.
20:48:56  tar: Error is not recoverable: exiting now
Body did not finish within grace period; terminating with extreme prejudice

@sxa
Copy link
Member Author

sxa commented Mar 22, 2021

The machine is not currently in a space-restricted state. My suspicion would be that with adjustments to make core dumps work on AIX we're filling up the file system during the tests.

@aixtools
Copy link
Contributor

Is this related to the test: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/7978/ ?

If so, I have repeated it using both jdk11 and jdk16 (#7999, #8000, #8001) and they also finish as unstable.
If so, I have something I want to try - but will need to install a lower level AIX 7.2 (e.g., on ojdk06)

If not related to this test - just disregard.

@sxa
Copy link
Member Author

sxa commented Mar 25, 2021

I don't know for sure which tests it was - it was just observed on the full extended.functional suite as linked in the initial description.

@aixtools
Copy link
Contributor

The links in the initial post are no longer available. Please generate a new one.

@aixtools
Copy link
Contributor

aixtools commented Apr 22, 2021

Hmm, cannot get the parameters right.

Restarting from: https://ci.adoptopenjdk.net/job/Test_openjdk16_j9_extended.functional_ppc64_aix/59/ - this passed on aix71 so shall rerun on aix72

  • fails in yet another way

** So, repeating my request from 3 days back: generate a new set of parameters to test this. I do not comprehend all the different settings - and 'rerun' is no guarantee for success.

@sxa
Copy link
Member Author

sxa commented Apr 22, 2021

* Re-running - based on upstream - https://ci.adoptopenjdk.net/job/Grinder/250/
* Re-running - based on nightly - (upstream failed) - https://ci.adoptopenjdk.net/job/Grinder/251/

That has BUILD_LIST=openjdk when you're trying to run functional tests. Needs BUILD_LIST=functional otherwise it does not build the correct tests, so it can't run anything in there which is why you're getting a summary of zero tests run:

08:35:49  TOTAL: 0   EXECUTED: 0   PASSED: 0   FAILED: 0   DISABLED: 0   SKIPPED: 0

Restarting from: https://ci.adoptopenjdk.net/job/Test_openjdk16_j9_extended.functional_ppc64_aix/59/ - this passed on aix71 so shall rerun on aix72
* fails in yet another way

I presume that's Grinder 253 that you're referring to? So that one correctly has BUILD_LIST=functional but but is pulling from "upstream" job 23 of https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk16u/job/jdk16u-aix-ppc64-openj9/ - since we only keep artifacts in jenkins from the last job and we're now on #25 those artifacts have already been removed which is why you get this message saying it failed to copy the artifacts from the job:

ERROR: Failed to copy artifacts from build-scripts/jobs/jdk16u/jdk16u-aix-ppc64-openj9 with filter: **/*.tar.gz,**/*.tgz,**/*.zip,**/*.jar,**/*.Z

In most cases you'll get away with it if you're re-running within a day or so of the original build job (since we build nightlies 4 times a week), but in this case since we're running release pipelines they're being removed sooner. To be safe, unless you ned a specific build, it's probably better to set the jobs to take nightly bulids via the API instead of upstream from jenkins.

@aixtools
Copy link
Contributor

aixtools commented May 3, 2021

  • I have reviewed - as best I can the failing test - and summarized in the attached file.
  • I do not know java well enough to get deeper - but will work with someone who can with java but perhaps is not comfortable with AIX
  • The keyword I looked at in the file is: testShrcAOTStatsExt
    issue_2059.txt

@karianna karianna modified the milestones: April 2021, May 2021 May 4, 2021
@Haroon-Khel Haroon-Khel modified the milestones: May 2021, June 2021 Jun 21, 2021
@sxa sxa modified the milestones: June 2021, July 2021 Jul 5, 2021
@Haroon-Khel Haroon-Khel modified the milestones: July 2021, August Aug 4, 2021
@sxa
Copy link
Member Author

sxa commented Sep 23, 2021

Can someone (@Haroon-Khel / @aixtools) verify whether this week's reinstall of this machine has resolved the issue please?

@sxa sxa modified the milestones: August 2021, September 2021 Sep 23, 2021
@aixtools
Copy link
Contributor

Hmm, Months ago - will probably be a complete refresh of all work - as I will have forgotten the details now.

Note: the re-install took the system to a different TL - AIX 7.2 TL4, rather than AIX 7.2 TL2 (SP2). Not sure if that is the confirmation you are looking for - even if the test now passes.

@aixtools
Copy link
Contributor

It seems - whatever was wrong - is okay now.

With BUILD_LIST=functional and TARGET=extended (and extended.functional) - all passed on 4 AIX systems.

See https://ci.adoptopenjdk.net/job/Grinder/1909/ through https://ci.adoptopenjdk.net/job/Grinder/1916/

@sxa
Copy link
Member Author

sxa commented Nov 19, 2021

Closing as discussed in slack as these are now passing :-)

@sxa sxa closed this as completed Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants