Helidon should not start JFR recording by default #9687

danielkec · 2025-01-24T18:02:04Z

Looks like VThreadSystemMetersProvider is starting JFR recording by default, that could have considerable performance impact. Recording stream is not being properly closed on Helidon shutdown.

Also it keeps open tmp file which blocks CRaC snapshotting.

jdk.internal.crac.mirror.CheckpointException
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: /tmp/2025_01_24_18_29_26_28322/2025_01_24_18_29_26.jfr
                at java.base/jdk.internal.crac.JDKFileResource.lambda$beforeCheckpoint$1(JDKFileResource.java:115)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:170)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:307)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:320)
        Caused by: java.lang.Exception: This file descriptor was created by main at epoch:1737739766969 here
                at java.base/jdk.internal.crac.JDKFdResource.<init>(JDKFdResource.java:61)
                at java.base/jdk.internal.crac.JDKFileResource.<init>(JDKFileResource.java:41)
                at java.base/java.io.RandomAccessFile$1.<init>(RandomAccessFile.java:105)
                at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:105)
                at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:265)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.lambda$createRandomAccessFile$13(SecuritySupport.java:381)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport$1.run(SecuritySupport.java:235)
                at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.doPrivilegedIOWithReturn(SecuritySupport.java:232)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.createRandomAccessFile(SecuritySupport.java:381)
                at jdk.jfr/jdk.jfr.internal.RepositoryChunk.<init>(RepositoryChunk.java:58)
                at jdk.jfr/jdk.jfr.internal.Repository.newChunk(Repository.java:98)
                at jdk.jfr/jdk.jfr.internal.PlatformRecorder.start(PlatformRecorder.java:236)
                at jdk.jfr/jdk.jfr.internal.PlatformRecording.start(PlatformRecording.java:127)
                at jdk.jfr/jdk.jfr.consumer.RecordingStream.startAsync(RecordingStream.java:379)
                at io.helidon.metrics.systemmeters.VThreadSystemMetersProvider.meterBuilders(VThreadSystemMetersProvider.java:117)

The text was updated successfully, but these errors were encountered:

tjquinno · 2025-01-24T20:27:14Z

It would be good to see numbers quantifying the performance impact in the default case.

When Helidon creates a RecordingStream for the virtual thread metrics, by default it purposefully subscribes only to two rare events: when a virtual thread is pinned and when a virtual thread cannot be submitted.

Only if the user changes a default config setting will Helidon also subscribe to virtual thread start and stop events which could indeed have a significant performance impact. Our doc warns users about this.
At the time we added the virtual thread metrics there was not a convenient way to know when the server is shutting down. IIRC that notification requires having a routing instance. Although reporting of metrics at /metrics or /observe/metrics of course involves routing, simply registering and internally querying meters does not. As a result there is no simple and effective way for the metrics code to know when to explicitly close the RecordingStream. (And I don't think we are to be using JVM shutdown hooks in Helidon.)

This would happen only at server shutdown and any file underlying the recording stream would be closed by JVM shutdown.

It would be great to explicitly close the recording stream during server shutdown. We need a good way to do that, though. If I've missed it I'd love to be corrected.

danielkec · 2025-01-25T15:19:11Z

We should, what I am afraid is that pinning is detected by measuring the time before VT yields
[4.x] Managed shutdown of JVM #8685

tjquinno · 2025-01-25T15:19:41Z

Also, users can set metrics.virtual-threads.enabled=false to completely disable these metrics and bypass the use of JFR entirely.

tjquinno · 2025-01-25T15:22:07Z

There is also this recent PR about dealing with the server lifecycle: #9655 That issue is specifically about adding an afterStart event but ideally there'd be a way that does not depend on routing to learn of afterStop.

danielkec added bug Something isn't working metrics performance labels Jan 24, 2025

github-project-automation bot added this to Backlog Jan 24, 2025

github-project-automation bot moved this to Triage in Backlog Jan 24, 2025

tjquinno mentioned this issue Jan 25, 2025

Adds an afterStart pseudo-event to LoomServer observable by Router implementations #9655

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helidon should not start JFR recording by default #9687

Helidon should not start JFR recording by default #9687

danielkec commented Jan 24, 2025 •

edited

Loading

tjquinno commented Jan 24, 2025 •

edited

Loading

danielkec commented Jan 25, 2025

tjquinno commented Jan 25, 2025

tjquinno commented Jan 25, 2025

Helidon should not start JFR recording by default #9687

Helidon should not start JFR recording by default #9687

Comments

danielkec commented Jan 24, 2025 • edited Loading

tjquinno commented Jan 24, 2025 • edited Loading

danielkec commented Jan 25, 2025

tjquinno commented Jan 25, 2025

tjquinno commented Jan 25, 2025

danielkec commented Jan 24, 2025 •

edited

Loading

tjquinno commented Jan 24, 2025 •

edited

Loading