Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics support for FT SE #9654

Merged
merged 9 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 69 additions & 1 deletion docs/src/main/asciidoc/se/fault-tolerance.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
///////////////////////////////////////////////////////////////////////////////

Copyright (c) 2020, 2024 Oracle and/or its affiliates.
Copyright (c) 2020, 2025 Oracle and/or its affiliates.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -259,6 +259,74 @@ and bulkhead is the last one (the first to be executed once a value is returned)
NOTE: This is the ordering used by the MicroProfile Fault Tolerance implementation
in Helidon when a method is decorated with multiple annotations.

== Metrics

The Helidon Fault Tolerance module has support for some basic metrics to monitor
certain application conditions. Metrics are disabled by default, but can be enabled
via config by setting the property `ft.metrics.enabled=true` and by including an actual
metrics implementation in your classpath. For more information about metrics implementations
see xref:{rootdir}/se/metrics/metrics.adoc[Helidon Metrics].

The following tables list all the metrics created by the Fault Tolerance module.
Note that these metrics are generated per command instance, and that each instance _must_
be identified by a unique name --assigned either programmatically by
the application developer or automatically by the API.

[cols="1,2,3"]
.Bulkheads
|===
^|Name ^|Tags ^|Description
|ft.bulkhead.calls.total | name="<bulkhead-name>" | Counter for all calls entering a bulkhead
spericas marked this conversation as resolved.
Show resolved Hide resolved
|ft.bulkhead.waitingDuration | name="<bulkhead-name>" | Distribution summary of waiting times to enter a bulkhead
|ft.bulkhead.executionsRunning | name="<bulkhead-name>" | Gauge whose value is the number of executions running in a bulkhead
|ft.bulkhead.executionsWaiting | name="<bulkhead-name>" | Gauge whose value is the number of executions waiting in a bulkhead
|===

[cols="1,2,3"]
.Circuit Breakers
|===
^|Name ^|Tags ^|Description
|ft.circuitbreaker.calls.total | name="<breaker-name>" | Counter for all calls entering a circuit breaker
|ft.circuitbreaker.opened.total | name="<breaker-name>" | Counter for the number of times a circuit breaker has moved from
closed to open state
|===

[cols="1,2,3"]
.Retries
|===
^|Name ^|Tags ^|Description
|ft.retry.calls.total | name="<retry-name>" | Counter for all calls entering a retry
|ft.retry.retries.total | name="<retry-name>" | Counter for all retried calls, excluding the initial call
|===

[cols="1,2,3"]
.Timeouts
|===
^|Name ^|Tags ^|Description
|ft.timeout.calls.total | name="<timeout-name>" | Counter for all calls entering a timeout
|ft.timeout.executionDuration | name="<timeout-name>" | Distribution summary of all execution durations in a timeout
|===

=== Enabling Metrics Programmatically

Metrics can be enabled programmatically either globally or, if disabled globally, individually for
each command instance. To enable metrics globally, call `FaultTolerance.config(Config)` passing
a Config instance that sets `ft.metrics.default-enabled=true`. This must be done on application startup,
before any command instances are created.

If metrics are not enabled globally, they can be enabled programmatically on each command instance
using the `enableMetrics(boolean)` method on its corresponding builder. For example, the
following snippet shows how to create a `Retry` instance of name `my-retry` with metrics
support enabled.
tjquinno marked this conversation as resolved.
Show resolved Hide resolved

[source,java]
----
include::{sourcedir}/se/FaultToleranceSnippets.java[tag=snippet_8, indent=0]
----

NOTE: The global config setting always takes precedence: that is, if metrics are enabled
globally, they *cannot* be disabled individually by calling `enableMetrics(false)`.

== Examples

See <<API>> section for examples.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2024 Oracle and/or its affiliates.
* Copyright (c) 2024, 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -127,4 +127,13 @@ <T> void snippet_7() {
T result = builder.build().invoke(this::mayTakeVeryLong);
// end::snippet_7[]
}

<T> void snippet_8() {
// tag::snippet_8[]
Retry retry = Retry.builder()
.name("my-retry")
.enableMetrics(true)
.build();
// end::snippet_8[]
}
}
22 changes: 22 additions & 0 deletions fault-tolerance/fault-tolerance/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@
<groupId>io.helidon.builder</groupId>
<artifactId>helidon-builder-api</artifactId>
</dependency>
<dependency>
<groupId>io.helidon.metrics</groupId>
<artifactId>helidon-metrics-api</artifactId>
</dependency>
<dependency>
<!--
Used to declare Features in module-info.java
Expand All @@ -71,6 +75,16 @@
<artifactId>helidon-common-testing-junit5</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.helidon.metrics.providers</groupId>
<artifactId>helidon-metrics-providers-micrometer</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.helidon.config</groupId>
<artifactId>helidon-config-yaml</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.helidon.logging</groupId>
<artifactId>helidon-logging-jul</artifactId>
Expand Down Expand Up @@ -140,6 +154,14 @@
</dependency>
</dependencies>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<forkCount>1</forkCount>
<reuseForks>false</reuseForks>
</configuration>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, 2024 Oracle and/or its affiliates.
* Copyright (c) 2020, 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -32,6 +32,32 @@
*/
@RuntimeType.PrototypedBy(BulkheadConfig.class)
public interface Bulkhead extends FtHandler, RuntimeType.Api<BulkheadConfig> {

/**
* Counter for all the calls in a bulkhead.
*/
String FT_BULKHEAD_CALLS_TOTAL = "ft.bulkhead.calls.total";
tjquinno marked this conversation as resolved.
Show resolved Hide resolved

/**
* Histogram of waiting time to enter a bulkhead.
*/
String FT_BULKHEAD_WAITINGDURATION = "ft.bulkhead.waitingDuration";

/**
* Gauge of number of executions running at a certain time.
*/
String FT_BULKHEAD_EXECUTIONSRUNNING = "ft.bulkhead.executionsRunning";

/**
* Gauge of number of executions waiting at a certain time.
*/
String FT_BULKHEAD_EXECUTIONSWAITING = "ft.bulkhead.executionsWaiting";

/**
* Gauge of number of executions rejected by the bulkhead.
*/
String FT_BULKHEAD_EXECUTIONSREJECTED = "ft.bulkhead.executionsRejected";

/**
* Create {@link Bulkhead} from its configuration.
*
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, 2024 Oracle and/or its affiliates.
* Copyright (c) 2023, 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -78,6 +78,19 @@ interface BulkheadConfigBlueprint extends Prototype.Factory<Bulkhead> {
*/
Optional<String> name();

/**
* Flag to enable metrics for this instance. The value of this flag is
* combined with the global config entry
* {@link io.helidon.faulttolerance.FaultTolerance#FT_METRICS_DEFAULT_ENABLED}.
* If either of these flags is {@code true}, then metrics will be enabled
* for the instance.
*
* @return metrics enabled flag
*/
@Option.Configured
@Option.DefaultBoolean(false)
boolean enableMetrics();

class BuilderDecorator implements Prototype.BuilderDecorator<BulkheadConfig.BuilderBase<?, ?>> {
@Override
public void decorate(BulkheadConfig.BuilderBase<?, ?> target) {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, 2024 Oracle and/or its affiliates.
* Copyright (c) 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -27,11 +27,16 @@
import java.util.concurrent.ExecutionException;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
import java.util.function.Supplier;

import io.helidon.metrics.api.Counter;
import io.helidon.metrics.api.Tag;
import io.helidon.metrics.api.Timer;

class BulkheadImpl implements Bulkhead {
private static final System.Logger LOGGER = System.getLogger(BulkheadImpl.class.getName());

Expand All @@ -42,9 +47,14 @@ class BulkheadImpl implements Bulkhead {
private final AtomicLong concurrentExecutions = new AtomicLong(0L);
private final AtomicLong callsAccepted = new AtomicLong(0L);
private final AtomicLong callsRejected = new AtomicLong(0L);
private final AtomicLong callsWaiting = new AtomicLong(0L);
private final List<QueueListener> listeners;
private final Set<Supplier<?>> cancelledSuppliers = new CopyOnWriteArraySet<>();
private final BulkheadConfig config;
private final boolean metricsEnabled;

private Counter callsCounterMetric;
private Timer waitingDurationMetric;

BulkheadImpl(BulkheadConfig config) {
this.inProgress = new Semaphore(config.limit(), true);
Expand All @@ -55,6 +65,16 @@ class BulkheadImpl implements Bulkhead {
: new ZeroCapacityQueue();
this.inProgressLock = new ReentrantLock(true);
this.config = config;

this.metricsEnabled = config.enableMetrics() || MetricsUtils.defaultEnabled();
if (metricsEnabled) {
Tag nameTag = Tag.create("name", name);
callsCounterMetric = MetricsUtils.counterBuilder(FT_BULKHEAD_CALLS_TOTAL, nameTag);
waitingDurationMetric = MetricsUtils.timerBuilder(FT_BULKHEAD_WAITINGDURATION, nameTag);
MetricsUtils.gaugeBuilder(FT_BULKHEAD_EXECUTIONSRUNNING, concurrentExecutions::get, nameTag);
MetricsUtils.gaugeBuilder(FT_BULKHEAD_EXECUTIONSWAITING, callsWaiting::get, nameTag);
MetricsUtils.gaugeBuilder(FT_BULKHEAD_EXECUTIONSREJECTED, callsRejected::get, nameTag);
}
}

@Override
Expand All @@ -76,6 +96,9 @@ public <T> T invoke(Supplier<? extends T> supplier) {
// execute immediately if semaphore can be acquired
boolean acquired;
try {
if (metricsEnabled) {
callsCounterMetric.increment();
}
acquired = inProgress.tryAcquire();
} catch (Throwable t) {
inProgressLock.unlock();
Expand Down Expand Up @@ -105,11 +128,23 @@ public <T> T invoke(Supplier<? extends T> supplier) {
try {
// block current thread until barrier is retracted
Barrier barrier;
long start = 0L;
try {
listeners.forEach(l -> l.enqueueing(supplier));
if (metricsEnabled) {
start = System.nanoTime();
callsWaiting.incrementAndGet();
}
barrier = queue.enqueue(supplier);
} finally {
inProgressLock.unlock(); // we have enqueued, now we can wait
try {
if (metricsEnabled) {
waitingDurationMetric.record(System.nanoTime() - start, TimeUnit.NANOSECONDS);
callsWaiting.decrementAndGet();
}
} finally {
inProgressLock.unlock(); // we have enqueued, now we can wait
}
}

if (barrier == null) {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, 2024 Oracle and/or its affiliates.
* Copyright (c) 2020, 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -31,6 +31,18 @@
*/
@RuntimeType.PrototypedBy(CircuitBreakerConfig.class)
public interface CircuitBreaker extends FtHandler, RuntimeType.Api<CircuitBreakerConfig> {

/**
* Counter for all the calls in a timeout.
*/
String FT_CIRCUITBREAKER_CALLS_TOTAL = "ft.circuitbreaker.calls.total";

/**
* Counter for the number of times a circuit breaks has moved from
* {@link State#CLOSED} to {@link State#OPEN}.
*/
String FT_CIRCUITBREAKER_OPENED_TOTAL = "ft.circuitbreaker.opened.total";

/**
* Create a new circuit builder based on its configuration.
*
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, 2024 Oracle and/or its affiliates.
* Copyright (c) 2023, 2025 Oracle and/or its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -102,6 +102,19 @@ interface CircuitBreakerConfigBlueprint extends Prototype.Factory<CircuitBreaker
@Option.Singular
Set<Class<? extends Throwable>> applyOn();

/**
* Flag to enable metrics for this instance. The value of this flag is
* combined with the global config entry
* {@link io.helidon.faulttolerance.FaultTolerance#FT_METRICS_DEFAULT_ENABLED}.
* If either of these flags is {@code true}, then metrics will be enabled
* for the instance.
*
* @return metrics enabled flag
*/
@Option.Configured
@Option.DefaultBoolean(false)
boolean enableMetrics();

class BuilderDecorator implements Prototype.BuilderDecorator<CircuitBreakerConfig.BuilderBase<?, ?>> {
@Override
public void decorate(CircuitBreakerConfig.BuilderBase<?, ?> target) {
Expand Down
Loading
Loading