Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK23 SharedClasses.SCM01.MultiThreadMultiCL_1_FAILED NumberFormatException: For input string: "-Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments*" #19439

Closed
JasonFengJ9 opened this issue May 2, 2024 · 17 comments

Comments

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented May 2, 2024

Failure link

From an internal build(ubu22-aarch64-1):

12:04:04  SCC stderr openjdk version "23-internal" 2024-09-17
12:04:04  SCC stderr OpenJDK Runtime Environment (build 23-internal-adhoc.jenkins.BuildJDKnextaarch64linuxPersonal)
12:04:04  SCC stderr Eclipse OpenJ9 VM (build master-0a576e37d3f, JRE 23 Linux aarch64-64-Bit Compressed References 20240501_107 (JIT enabled, AOT enabled)
12:04:04  SCC stderr OpenJ9   - 0a576e37d3f
12:04:04  SCC stderr OMR      - 49ff22bb6af
12:04:04  SCC stderr JCL      - 887abee04bd based on jdk-23+20)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

12:03:58  ===============================================
12:03:58  Running test SharedClasses.SCM01.MultiThreadMultiCL_1 ...
12:03:58  ===============================================
12:03:58  SharedClasses.SCM01.MultiThreadMultiCL_1 Start Time: Wed May  1 16:03:57 2024 Epoch Time (ms): 1714579437933
12:03:58  variation: Mode610
12:03:58  JVM_OPTIONS:  -Xcompressedrefs -Xjit -Xgcpolicy:gencon 

12:05:47  STF 16:05:46.557 - Monitoring processes: MTM1 MTM2 MTM3 MTM4 MTM5
12:05:47  MTM2 stderr java.lang.NumberFormatException: For input string: "-Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments*"
12:05:47  MTM2 stderr 	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
12:05:47  MTM2 stderr 	at java.base/java.lang.Integer.parseInt(Integer.java:588)
12:05:47  MTM2 stderr 	at java.base/java.lang.Integer.parseInt(Integer.java:685)
12:05:47  MTM2 stderr 	at net.openj9.test.sc.LoaderSlaveMultiThreadMultiCL.main(LoaderSlaveMultiThreadMultiCL.java:59)

14:07:50  STF 18:07:47.697 - Monitoring Report Summary:
14:07:50  STF 18:07:47.697 -   o Process SCC ended with the expected exit code (1)
14:07:50  STF 18:07:47.698 - TEARDOWN stage completed
14:07:50  STF 18:07:47.707 - 
14:07:50  STF 18:07:47.707 - =====================   R E S U L T S   =====================
14:07:50  STF 18:07:47.707 - Stage results:
14:07:50  STF 18:07:47.707 -   setUp:     pass
14:07:50  STF 18:07:47.707 -   execute:  *fail*
14:07:50  STF 18:07:47.707 -   teardown:  pass
14:07:50  STF 18:07:47.707 - 
14:07:50  STF 18:07:47.707 - Overall result: **FAILED**
14:07:50  -----------------------------------
14:07:50  SharedClasses.SCM01.MultiThreadMultiCL_1_FAILED

50x internal grinder - #19439 (comment)

@pshipton
Copy link
Member

pshipton commented May 3, 2024

4/5 machines failed 5/10, the other failed 3/10.

@llxia pls have someone from test take the first look since it's LoaderSlaveMultiThreadMultiCL.main() trying to parse an integer. It expects args[1] to be an int, but for some reason it's a JVM argument.

I found https://github.com/eclipse-aqavit/openj9-systemtest/blob/master/openj9.test.sharedClasses/src/test.sharedClasses/net/openj9/stf/SharedClasses.java#L257

@pshipton pshipton added this to the Java 23 milestone May 3, 2024
@pshipton
Copy link
Member

pshipton commented May 3, 2024

Last passing build
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdknext_j9_extended.system_aarch64_linux_Personal_testList_0/8

21:08:35  Eclipse OpenJ9 VM (build master-4998aab3ce1, JRE 23 Linux aarch64-64-Bit Compressed References 20240423_105 (JIT enabled, AOT enabled)
21:08:35  OpenJ9   - 4998aab3ce1
21:08:35  OMR      - 723d2e4b637
21:08:35  JCL      - a2f1462ae3f based on jdk-23+19)

ADOPTOPENJDK_REPO=https://github.com/adoptium/aqa-tests.git
ADOPTOPENJDK_BRANCH=6ad39d2b86de28a9805906a8cdb8569cb5631c34
TKG_REPO=https://github.com/adoptium/TKG.git
TKG_BRANCH=41c3581c810f2073908cd74ebd2050d37a72cdcb
[email protected]:runtimes/test.git
test_BRANCH=76308395d82c8e77861dfb74c7b14419683deded
STF_REPO=https://github.com/adoptium/STF.git
STF_BRANCH=89020fe1391aecaa6d2cf4f5402b60d200705c2a
AQA_SYSTEMTEST_REPO=https://github.com/adoptium/aqa-systemtest.git
AQA_SYSTEMTEST_BRANCH=b68aae2bd49fa693bc7b8e8b5ae19a02244b5c75
OPENJ9_SYSTEMTEST_REPO=https://github.com/eclipse/openj9-systemtest.git
OPENJ9_SYSTEMTEST_BRANCH=e8ca4718d20d6f2f147a65de55aa2714f293c6bd

First failing build, this is a personal build.
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdknext_j9_extended.system_aarch64_linux_Personal_testList_0/9

00:11:03  Eclipse OpenJ9 VM (build criurestoredbg-a5a74e6b325, JRE 23 Linux aarch64-64-Bit Compressed References 20240426_106 (JIT enabled, AOT enabled)
00:11:03  OpenJ9   - a5a74e6b325
00:11:03  OMR      - 33a1542b9be
00:11:03  JCL      - d40bb40155c based on jdk-23+19)

ADOPTOPENJDK_REPO=https://github.com/adoptium/aqa-tests.git
ADOPTOPENJDK_BRANCH=61ebc47e0e2ac111057fbc09b62e7a14fc5c747a
TKG_REPO=https://github.com/adoptium/TKG.git
TKG_BRANCH=8a2e0a6dace721f1888361a983aa48c619dbe68a
[email protected]:runtimes/test.git
test_BRANCH=c35046cfded0855e4cb365403271e21d70971743
STF_REPO=https://github.com/adoptium/STF.git
STF_BRANCH=89020fe1391aecaa6d2cf4f5402b60d200705c2a
AQA_SYSTEMTEST_REPO=https://github.com/adoptium/aqa-systemtest.git
AQA_SYSTEMTEST_BRANCH=b68aae2bd49fa693bc7b8e8b5ae19a02244b5c75
OPENJ9_SYSTEMTEST_REPO=https://github.com/eclipse/openj9-systemtest.git
OPENJ9_SYSTEMTEST_BRANCH=e8ca4718d20d6f2f147a65de55aa2714f293c6bd

JasonFengJ9/openj9@4998aab...a5a74e6 - SHA a5a74e6b325 no longer available
eclipse-openj9/openj9-omr@723d2e4...33a1542
ibmruntimes/openj9-openjdk-jdk@a2f1462...d40bb40
adoptium/aqa-tests@6ad39d2...61ebc47
adoptium/TKG@41c3581...8a2e0a6
https://github.ibm.com/runtimes/test/compare/76308395d82...c35046cfded0

First non-personal build failure
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdknext_j9_extended.system_aarch64_linux_Personal_testList_1/10

11:51:32  Eclipse OpenJ9 VM (build master-0a576e37d3f, JRE 23 Linux aarch64-64-Bit Compressed References 20240501_107 (JIT enabled, AOT enabled)
11:51:32  OpenJ9   - 0a576e37d3f
11:51:32  OMR      - 49ff22bb6af
11:51:32  JCL      - 887abee04bd based on jdk-23+20)

4998aab...0a576e3
ibmruntimes/openj9-openjdk-jdk@a2f1462...887abee

@pshipton
Copy link
Member

pshipton commented May 3, 2024

Ah interesting, the NumberFormatException isn't the cause of the failure, this occurs in passing runs.

@pshipton pshipton removed the comp:test label May 3, 2024
@llxia
Copy link
Contributor

llxia commented May 3, 2024

According to the test code, the NumberFormatException will be caught. And the test will move on.
https://github.com/eclipse-aqavit/openj9-systemtest/blob/master/openj9.test.sharedClasses/src/test.sharedClasses/net/openj9/test/sc/LoaderSlaveMultiThreadMultiCL.java#L57-L64

Details:
Two args ( jarName and int threadsMAX) are expected in the tests (i.e., /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_1714758670750/SharedClasses.SCM01.MultiThreadMultiCL_1/20240503-175111-SharedClasses/tmp/classes.jar 120). However, in some cases, extra arg is added https://github.com/eclipse-aqavit/openj9-systemtest/blob/master/openj9.test.sharedClasses/src/test.sharedClasses/net/openj9/stf/SharedClasses.java#L257. The args[] becomes - /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_1714758670750/SharedClasses.SCM01.MultiThreadMultiCL_1/20240503-175111-SharedClasses/tmp/classes.jar -Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments* 120. args[1] is no longer int. Therefore, we have NumberFormatException. As mentioned above, this does not fail the test.

@llxia
Copy link
Contributor

llxia commented May 3, 2024

I do not think this PR - Add new debug options for sc mt test is valid: adoptium/openj9-systemtest@df3753a. Instead of adding -Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments* to args, I think we should add it to addJvmOption.

@annaibm could you give it a try? Thanks

@pshipton
Copy link
Member

pshipton commented May 3, 2024

Actual failure **FAILED** Process MTM1 has timed out

Looking at the core for
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdknext_j9_extended.system_aarch64_linux_Personal_testList_0/9
SharedClasses.SCM23.SingleCL_1

id: 0xffffa0082528 name: VM class loader modules (classLoaderModuleAndLocationMutex)
        owner thread id: 2466573 name: main
        waiting thread id: 2466632 name: Common-Cleaner

id: 0xffffa0082698 name: VM class table (classTableMutex)
        owner thread id: 2466632 name: Common-Cleaner
        waiting thread id: 2466573 name: main
        waiting thread id: 2466602 name: vmthread @79616
        waiting thread id: 2466622 name: vmthread @156160

    name:          main
    Thread object: java/lang/Thread @ 0xfff040e0
    ID:            2 (0x2)
    Priority:      -1
    Thread.State:  BLOCKED
    JVMTI state:   ALIVE BLOCKED_ON_MONITOR_ENTER
      waiting to enter: "VM class table" with ID 0xffffa0082698 owner name: "Common-Cleaner" owner id: 2466632
    Java stack frames:
     bp: 0x000000000010f150  method: void java/lang/Module.defineModule0(Module, boolean, String, String, Object[])  (Native Method)
      objects: 0xfff39f20 0x84076060 0xfff39f98 0xfff39fc8                                                                   bp: 0x000000000010f1b8  method: void java/lang/Module.<init>(ModuleLayer, ClassLoader, module.ModuleDescriptor, java.net.URI)  source: Module.java:145                                                                                           objects: <no objects in this frame>                                                                                    bp: 0x000000000010f1f8  method: Module java/lang/Access.defineModule(ClassLoader, module.ModuleDescriptor, java.net.URI)  source: Access.java:244                                                                                                objects: 0xfff39f20                                                                                                    bp: 0x000000000010f228  method: Module jdk/internal/module/Modules.defineModule(ClassLoader, module.ModuleDescriptor, java.net.URI)  source: Modules.java:82                                                                                     objects: <no objects in this frame>                                                                                    bp: 0x000000000010f328  method: ModuleLayer jdk/internal/module/ModuleBootstrap.boot2()  source: ModuleBootstrap.java:274                                                                                                                        objects: 0xfff00510 0xfff00510 0xfff1a028 0xfff32368                                                                   bp: 0x000000000010f350  method: ModuleLayer jdk/internal/module/ModuleBootstrap.boot()  source: ModuleBootstrap.java:179                                                                                                                         objects: <no objects in this frame>                                                                                    bp: 0x000000000010f398  method: void java/lang/ClassLoader.initializeClassLoaders()  source: ClassLoader.java:213
      objects: <no objects in this frame>
     bp: 0x000000000010f408  method: void java/lang/Thread.initialize(boolean, ThreadGroup, Thread, java.security.AccessControlContext, int)  source: Thread.java:3080
      objects: 0xfff040e0 0xfff04170
     bp: 0x000000000010f458  method: void java/lang/Thread.<init>(String, Object, int, boolean)  source: Thread.java:3162
      objects: 0xfff040e0

    name:          Common-Cleaner
    Thread object: jdk/internal/misc/InnocuousThread @ 0xfff35f70
    ID:            3 (0x3)
    Priority:      -1
    Thread.State:  BLOCKED
    JVMTI state:   ALIVE BLOCKED_ON_MONITOR_ENTER
      waiting to enter: "VM class loader modules" with ID 0xffffa0082528 owner name: "main" owner id: 2466573
    Java stack frames:
     bp: 0x0000000000140ce8  method: Class java/lang/VMAccess.findClassOrNull(String, ClassLoader)  (Native Method)
      objects: 0xfff41010 0xfff0ee80
     bp: 0x0000000000140d18  method: Class java/lang/ClassLoader.loadClass(String)  source: ClassLoader.java:1093
      objects: 0xfff41010
     bp: 0x0000000000140da8  method: void java/lang/Object.wait(long, int)  source: Object.java:253
      objects: 0xfff358d0
     bp: 0x0000000000140dd8  method: void java/lang/Object.wait(long)  source: Object.java:221
      objects: <no objects in this frame>
     bp: 0x0000000000140e20  method: ref.Reference java/lang/ref/ReferenceQueue.remove(long)  source: ReferenceQueue.java:137
      objects: 0xfff358d0 0xfff358d0
     bp: 0x0000000000140e58  method: void jdk/internal/ref/CleanerImpl.run()  source: CleanerImpl.java:140
      objects: 0xfff358c0 0xfff35f70
     bp: 0x0000000000140e88  method: void java/lang/Thread.runWith(Object, Runnable)  source: Thread.java:1600
      objects: 0x840727c8
     bp: 0x0000000000140eb8  method: void java/lang/Thread.run()  source: Thread.java:1587
      objects: <no objects in this frame>
     bp: 0x0000000000140ed8  method: void jdk/internal/misc/InnocuousThread.run()  source: InnocuousThread.java:186
      objects: <no objects in this frame>

threads.txt

@pshipton
Copy link
Member

pshipton commented May 3, 2024

@keithc-ca fyi this is why #19417 made a difference, it avoided loading Blocker early in bootstrap, which is no longer loaded elsewhere.

@pshipton
Copy link
Member

pshipton commented May 3, 2024

@tajila pls assign, the deadlock condition needs to be fixed.

@pshipton
Copy link
Member

pshipton commented May 3, 2024

There is no core file for the reported failure, but the earlier failure on a personal build has one and shows we have a deadlock condition. Although the java code was modified after the personal build by #19417 to avoid loading Blocker, I assume this deadlock problem is lurking and may still show up causing hangs. We could try to repeat the problem on the latest builds and get a core file.

I don't have native stack traces, but this could be what's going on.

findClassOrNull() calls internalFindClassString(), calls internalFindClassInModule(),
calls loadNonArrayClass() and on classsupport.c : 1034 locks classTableMutex
calls arbitratedLoadClass(), calls callLoadClass(), calls internalFindKnownClass()
internalFindKnownClass() locks classLoaderModuleAndLocationMutex on line 1260

JVM_DefineModule() locks classLoaderModuleAndLocationMutex at the start of the function java11vmi.c : 768
calls allClassesStartDo() which locks classTableMutex

@pshipton
Copy link
Member

pshipton commented May 3, 2024

@llxia @annaibm whatever problem we were trying to get diagnostics for via adoptium/openj9-systemtest@df3753a seems obsolete now. I think we should remove the change rather than fix it.

Adding -Xdump options changes the behavior and can obscure problems, it's not ideal to run with these options all the time.

@tajila
Copy link
Contributor

tajila commented May 3, 2024

I don't have native stack traces, but this could be what's going on.

If its easily reproduceable I can make a patch to address that case.

@pshipton
Copy link
Member

pshipton commented May 5, 2024

If its easily reproduceable I can make a patch to address that case.

It failed 23/50 in the grinder.

annaibm added a commit to annaibm/openj9-systemtest that referenced this issue May 6, 2024
- Remove `-Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments*` from args\

related: eclipse-openj9/openj9#19439

Signed-off-by: Anna Babu Palathingal <[email protected]>
annaibm added a commit to annaibm/openj9-systemtest that referenced this issue May 6, 2024
- Remove `-Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments*` from args\

related: eclipse-openj9/openj9#19439

Signed-off-by: Anna Babu Palathingal <[email protected]>
@tajila tajila self-assigned this May 6, 2024
annaibm added a commit to annaibm/openj9-systemtest that referenced this issue May 6, 2024
- Remove `-Xdump:system+java:events=throw,filter=java/lang/NullPointerException#java/lang/invoke/BruteArgumentMoverHandle.permuteArguments*` from args\

related: eclipse-openj9/openj9#19439

Signed-off-by: Anna Babu Palathingal <[email protected]>
tajila added a commit to tajila/openj9 that referenced this issue May 7, 2024
Thread 1 creates java.base:
- acquires module (classLoaderModuleAndLocationMutex) mutex
- acquires class mutex (classTableMutex) to update all bootstrap classes

Thread 2 loads a class
- aquires class mutex
- acquire module mutex to search for class location

There is a scenario where Thread 1 acquires module mutex, then thread 2
acquires class mutex. Thread 1 cannot complete creating java.base
because it needs the class mutex and Thread 2 cannot complete class
loading because it needs the module mutex.

Proposed solution is for Thread 1 to acquire the class mutex first if
it is known that it will be needed.

Related: eclipse-openj9#19439

Signed-off-by: tajila <[email protected]>
@tajila
Copy link
Contributor

tajila commented May 7, 2024

A grinder with test build https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/40386 (rerun)

@tajila
Copy link
Contributor

tajila commented May 9, 2024

@llxia
Copy link
Contributor

llxia commented May 9, 2024

just for the record, re #19439 (comment), -Xdump options is removed. See adoptium/openj9-systemtest#153

@tajila
Copy link
Contributor

tajila commented May 9, 2024

Passed 50/50 with the #19458 changes

@tajila
Copy link
Contributor

tajila commented May 13, 2024

Closing this as changes are in

@tajila tajila closed this as completed May 13, 2024
AswathySK pushed a commit to AswathySK/openj9 that referenced this issue Jun 4, 2024
Thread 1 creates java.base:
- acquires module (classLoaderModuleAndLocationMutex) mutex
- acquires class mutex (classTableMutex) to update all bootstrap classes

Thread 2 loads a class
- aquires class mutex
- acquire module mutex to search for class location

There is a scenario where Thread 1 acquires module mutex, then thread 2
acquires class mutex. Thread 1 cannot complete creating java.base
because it needs the class mutex and Thread 2 cannot complete class
loading because it needs the module mutex.

Proposed solution is for Thread 1 to acquire the class mutex first if
it is known that it will be needed.

Related: eclipse-openj9#19439

Signed-off-by: tajila <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants