Skip to content

Commit

Permalink
Use Caffeine as DataNucleus L2 cache
Browse files Browse the repository at this point in the history
Users continue to run into situations where the DataNucleus L2 cache grows to excessive sizes, eventually causing the application to become unresponsive, or crash due to OOM.

The default, `Map`-based L2 cache of DataNucleus does not support time- or size-based expiration of entries. Users are thus unable to limit the impact of L2 caching in their environment.

Disabling the L2 cache entirely is possible, but because the default L2 caches don't expose hit/miss metrics, we've been blind to the impact of disabling it so far.

This change switches the default L2 cache implementation to one backed by Caffeine (https://github.com/ben-manes/caffeine). This cache supports time and size expiry policies, and is capable of exposing more detailed metrics.

The example Grafana dashboard was updated to include a widget for the newly available metrics. This should make it easier for users to determine whether they can safely disable L2 caching altogether (i.e. high miss-rates, low hit-rates), and monitor the effectiveness or their configured expiry policies.

It should also help us to determine if we can safely turn off L2 caching per default, to avoid such problematic situations for users entirely.

Signed-off-by: nscuro <[email protected]>
  • Loading branch information
nscuro committed Oct 24, 2024
1 parent 2c0755b commit 08994ec
Show file tree
Hide file tree
Showing 5 changed files with 341 additions and 15 deletions.
209 changes: 197 additions & 12 deletions dev/monitoring/grafana/dashboards/apiserver.json
Original file line number Diff line number Diff line change
Expand Up @@ -2402,6 +2402,197 @@
"title": "ORM Cache Entries",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"description": "Number of events per second in the L2 cache of the object relational mapper (ORM).\n\nNote that these metrics are only available when \"alpine.metrics.enabled\" is \"true\", \"alpine.datanucleus.cache.level2.type\" is \"caffeine\", and \"alpine.datanucleus.cache.level2.statisticsenabled\" is \"true\".",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "opacity",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "ops"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Hits"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "green",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Misses"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "yellow",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Puts"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "blue",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Evictions"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "red",
"mode": "fixed"
}
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 58
},
"id": 81,
"options": {
"legend": {
"calcs": [
"lastNotNull",
"max",
"mean"
],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"editorMode": "code",
"expr": "sum(rate(cache_gets_total{instance=\"$instance\", cache=\"datanucleus_second_level\", result=\"hit\"}[1m]))",
"legendFormat": "Hits",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"editorMode": "code",
"expr": "sum(rate(cache_gets_total{instance=\"$instance\", cache=\"datanucleus_second_level\", result=\"miss\"}[1m]))",
"hide": false,
"legendFormat": "Misses",
"range": true,
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"editorMode": "code",
"expr": "sum(rate(cache_puts_total{instance=\"$instance\", cache=\"datanucleus_second_level\"}[1m]))",
"hide": false,
"legendFormat": "Puts",
"range": true,
"refId": "C"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"editorMode": "code",
"expr": "sum(rate(cache_evictions_total{instance=\"$instance\", cache=\"datanucleus_second_level\"}[1m]))",
"hide": false,
"legendFormat": "Evictions",
"range": true,
"refId": "D"
}
],
"title": "ORM L2 Cache Events",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": {
Expand Down Expand Up @@ -3292,8 +3483,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down Expand Up @@ -3405,8 +3595,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down Expand Up @@ -3503,8 +3692,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down Expand Up @@ -3601,8 +3789,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down Expand Up @@ -3699,8 +3886,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down Expand Up @@ -3797,8 +3983,7 @@
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
"color": "green"
},
{
"color": "red",
Expand Down
7 changes: 7 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
<lib.cvss-calculator.version>1.4.2</lib.cvss-calculator.version>
<lib.owasp-rr-calculator.version>1.0.1</lib.owasp-rr-calculator.version>
<lib.cyclonedx-java.version>9.1.0</lib.cyclonedx-java.version>
<lib.datanucleus-cache-caffeine.version>0.1.0</lib.datanucleus-cache-caffeine.version>
<lib.greenmail.version>2.1.0</lib.greenmail.version>
<lib.jackson.version>2.18.0</lib.jackson.version>
<lib.jackson-databind.version>2.18.0</lib.jackson-databind.version>
Expand Down Expand Up @@ -199,6 +200,12 @@
<version>${lib.cyclonedx-java.version}</version>
</dependency>

<dependency>
<groupId>io.github.nscuro</groupId>
<artifactId>datanucleus-cache-caffeine</artifactId>
<version>${lib.datanucleus-cache-caffeine.version}</version>
</dependency>

<!-- org.json
This was previously transitively included with Unirest. However, Unirest v3.x removed reliance on org.json
in favor of their own API compatible replacement. Therefore, it was necessary to directly include org.json.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/*
* This file is part of Dependency-Track.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* SPDX-License-Identifier: Apache-2.0
* Copyright (c) OWASP Foundation. All Rights Reserved.
*/
package org.dependencytrack.observability;

import alpine.common.logging.Logger;
import alpine.server.persistence.PersistenceManagerFactory;
import io.github.nscuro.datanucleus.cache.caffeine.CaffeineLevel2Cache;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.binder.cache.CaffeineCacheMetrics;
import org.datanucleus.api.jdo.JDODataStoreCache;
import org.datanucleus.api.jdo.JDOPersistenceManagerFactory;
import org.datanucleus.cache.Level2Cache;

import javax.jdo.PersistenceManager;
import java.util.concurrent.TimeUnit;

import static org.datanucleus.PropertyNames.PROPERTY_CACHE_L2_STATISTICS_ENABLED;
import static org.datanucleus.PropertyNames.PROPERTY_CACHE_L2_TYPE;

public class MeterRegistryCustomizer implements alpine.common.metrics.MeterRegistryCustomizer {

private static final Logger LOGGER = Logger.getLogger(MeterRegistryCustomizer.class);

@Override
public void accept(final MeterRegistry meterRegistry) {
maybeRegisterCaffeineLevel2CacheMetrics(meterRegistry);
}

/**
* Register Caffeine-specific metrics for the DataNucleus L2 cache, if and only if
* Caffeine is configured as L2 cache via {@code datanucleus.cache.level2.type}.
* <p>
* DataNucleus' {@link Level2Cache} doesn't expose any more statistics than the size,
* but we would like to monitor hit, miss, and invalidation metrics.
*/
@SuppressWarnings("BusyWait")
private void maybeRegisterCaffeineLevel2CacheMetrics(final MeterRegistry meterRegistry) {
// The customizer executes before the persistence context is created.
// Use a separate thread to wait for the persistence context to become available,
// and register cache metrics once it is.
//
// To prevent the thread from waiting forever (should not happen),
// cap the max wait duration at 15 seconds.
final long timeoutMs = TimeUnit.SECONDS.toMillis(15);

final var thread = new Thread(() -> {
final long startTimeMs = System.currentTimeMillis();

while ((System.currentTimeMillis() - startTimeMs) < timeoutMs) {
try (final PersistenceManager pm = PersistenceManagerFactory.createPersistenceManager()) {
final var pmf = (JDOPersistenceManagerFactory) pm.getPersistenceManagerFactory();
if (!"caffeine".equals(pmf.getProperties().get(PROPERTY_CACHE_L2_TYPE))) {
LOGGER.debug("Not registering Caffeine L2 cache metrics, because %s is not \"caffeine\""
.formatted(PROPERTY_CACHE_L2_TYPE));
return;
}
if (!Boolean.TRUE.equals(pmf.getProperties().get(PROPERTY_CACHE_L2_STATISTICS_ENABLED))) {
LOGGER.debug("Not registering Caffeine L2 cache metrics, because %s is not enabled"
.formatted(PROPERTY_CACHE_L2_STATISTICS_ENABLED));
return;
}

final var dataStoreCache = (JDODataStoreCache) pmf.getDataStoreCache();
if (dataStoreCache.getLevel2Cache() instanceof final CaffeineLevel2Cache level2Cache) {
new CaffeineCacheMetrics<>(
level2Cache.getCaffeineCache(),
/* cacheName */ "datanucleus_second_level",
/* tags */ null)
.bindTo(meterRegistry);
LOGGER.debug("Registered Caffeine L2 cache metrics");
}

break;
} catch (IllegalStateException e) {
// Persistence context not created yet.
}

try {
Thread.sleep(500);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IllegalStateException("Thread was interrupted while sleeping", e);
}
}
});
thread.start();
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
org.dependencytrack.observability.MeterRegistryCustomizer
Loading

0 comments on commit 08994ec

Please sign in to comment.