Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helidon should not start JFR recording by default #9687

Open
danielkec opened this issue Jan 24, 2025 · 4 comments
Open

Helidon should not start JFR recording by default #9687

danielkec opened this issue Jan 24, 2025 · 4 comments
Labels
bug Something isn't working metrics performance

Comments

@danielkec
Copy link
Contributor

danielkec commented Jan 24, 2025

Looks like VThreadSystemMetersProvider is starting JFR recording by default, that could have considerable performance impact. Recording stream is not being properly closed on Helidon shutdown.

Also it keeps open tmp file which blocks CRaC snapshotting.

jdk.internal.crac.mirror.CheckpointException
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: /tmp/2025_01_24_18_29_26_28322/2025_01_24_18_29_26.jfr
                at java.base/jdk.internal.crac.JDKFileResource.lambda$beforeCheckpoint$1(JDKFileResource.java:115)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:170)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:307)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:320)
        Caused by: java.lang.Exception: This file descriptor was created by main at epoch:1737739766969 here
                at java.base/jdk.internal.crac.JDKFdResource.<init>(JDKFdResource.java:61)
                at java.base/jdk.internal.crac.JDKFileResource.<init>(JDKFileResource.java:41)
                at java.base/java.io.RandomAccessFile$1.<init>(RandomAccessFile.java:105)
                at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:105)
                at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:265)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.lambda$createRandomAccessFile$13(SecuritySupport.java:381)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport$1.run(SecuritySupport.java:235)
                at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.doPrivilegedIOWithReturn(SecuritySupport.java:232)
                at jdk.jfr/jdk.jfr.internal.SecuritySupport.createRandomAccessFile(SecuritySupport.java:381)
                at jdk.jfr/jdk.jfr.internal.RepositoryChunk.<init>(RepositoryChunk.java:58)
                at jdk.jfr/jdk.jfr.internal.Repository.newChunk(Repository.java:98)
                at jdk.jfr/jdk.jfr.internal.PlatformRecorder.start(PlatformRecorder.java:236)
                at jdk.jfr/jdk.jfr.internal.PlatformRecording.start(PlatformRecording.java:127)
                at jdk.jfr/jdk.jfr.consumer.RecordingStream.startAsync(RecordingStream.java:379)
                at io.helidon.metrics.systemmeters.VThreadSystemMetersProvider.meterBuilders(VThreadSystemMetersProvider.java:117)
@tjquinno
Copy link
Member

tjquinno commented Jan 24, 2025

  1. It would be good to see numbers quantifying the performance impact in the default case.

    When Helidon creates a RecordingStream for the virtual thread metrics, by default it purposefully subscribes only to two rare events: when a virtual thread is pinned and when a virtual thread cannot be submitted.

    Only if the user changes a default config setting will Helidon also subscribe to virtual thread start and stop events which could indeed have a significant performance impact. Our doc warns users about this.

  2. At the time we added the virtual thread metrics there was not a convenient way to know when the server is shutting down. IIRC that notification requires having a routing instance. Although reporting of metrics at /metrics or /observe/metrics of course involves routing, simply registering and internally querying meters does not. As a result there is no simple and effective way for the metrics code to know when to explicitly close the RecordingStream. (And I don't think we are to be using JVM shutdown hooks in Helidon.)

    This would happen only at server shutdown and any file underlying the recording stream would be closed by JVM shutdown.

    It would be great to explicitly close the recording stream during server shutdown. We need a good way to do that, though. If I've missed it I'd love to be corrected.

@danielkec
Copy link
Contributor Author

  1. We should, what I am afraid is that pinning is detected by measuring the time before VT yields
  2. [4.x] Managed shutdown of JVM #8685

@tjquinno
Copy link
Member

Also, users can set metrics.virtual-threads.enabled=false to completely disable these metrics and bypass the use of JFR entirely.

@tjquinno
Copy link
Member

There is also this recent PR about dealing with the server lifecycle: #9655 That issue is specifically about adding an afterStart event but ideally there'd be a way that does not depend on routing to learn of afterStop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working metrics performance
Projects
Status: Triage
Development

No branches or pull requests

2 participants